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Description 

Background_of the hyention 

5 Mutagenesis is a powerful tool in the study of protein structure and function. Mutations can be made in 
the nucleotide sequence of a cloned gene encoding a protein of interest and the nnodified gene can be 
expressed to produce mutants of the protein. By comparing the properties of a wild-type protein and the 
mutants generated, it is often possible to identify individual amino acids or domains of amino acids that are 
essential for the structural integrity and/or biochemical function of the protein, such as its binding and/or 

70 catalytic activity. 

Mutagenesis, however, is beset by several limitations. Among these are the large number of mutants 
that can be generated and the practical inability to select from these, the mutants that will be informative or 
have a desired property. For instance, there is no reliable way to predict whether the substitution, deletion 
or insertion of a particular amino acid in a protein will have a local or global effect on the protein, and 

T5 therefore, whether it will be likely to yield useful information or function. 
. : Because of these limitations, attempts to improve properties of a protein by mutagenesis have relied 
mostly on the generation and analysis of mutations that are restricted to specific, putatively important 
regions of the protein, such as regions at or around the active site of the protein. But, even though 
mutations are restricted to certain regions of a protein, the number of potential mutations can be extremely 

20 large, making it difficult or impossible to identify and evaluate those produced. For example, substitution of 
a single amino acid position with all the other naturally occurring amino acids yields 19 different variants of 
a protein. If several positions are substituted at once, the number of variants increases exponentially. For 
substitution with all amino acids at seven amino acid positions of a protein, 19x 19x 19x19x 19x19x 19 
or 8.9 X 10^ variants of the protein are generated, from which useful mutants must be selected. It follows 

25 that, for an effective use of mutagenesis, the type and number of mutations must be subjected to seme 
restrictive criteria which keep the number of mutant proteins generated to a number suitable for screening. 

A method of mutagenesis that has been developed to produce very specific mutations in a protein is 
site-directed mutagenesis. The method is most useful for studying small sites known or suspected to be 
involved In a particular protein function. In this method, nucleotide substitutions (point mutations) are made 

30 at defined locations in a DNA sequence in order to bring about a desired substitution of one amino acid for 
another in the encoded amino acid sequence. The method is oligonucleotide-mediated. A synthetic 
oligonucleotide is constructed that is complementary to the DNA encoding the region of the protein where 
the mutation is to be made, but which bears an unmatched base(s) at the desired position(s) of the base 
substitution (s). The mutated oligonucleotide is used to prime the synthesis of a new DNA strand which 

35 incorporates the change(s) and, therefore, leads to the synthesis of the mutant gene. See Zoller, M. J. and 
Smith, M.. Meth. Enzymol. 1_00, 468 (1983). 

Variations of site-directed mutagenesis have been developed to optimize aspects of the procedure. For 
the most part, they are based on the original methods of Hutchinson, C.A, et al., J*_Biol._Chem. 253:6551 
(1978) and Razin, A. et at., Proc_. NatL Acad._Sci^ USA 75:4268 (1978). For an extensive descriplion of site- 

40 directed mutagensis, see Molecular Cloning, A Laboratory Manual, 1989, Sambrook, Fritsch and Maniatis, 
Cold Spring Harbor, New York, chapter 15. 

A method of mutagenesis designed to produce a larger number of mutations is the "saturation" 
mutagenesis. This process is oligonucleotide-mediated also. In this method, all possible point mutations 
(nucleotide substitutions) are made at one or more positions within DNA encoding a given region of a 

45 protein. These mutations are made by synthesizing a single mixture of oligonucleotides which is inserted 
into the gene in place of the natural segment of DNA encoding the region. At each step in the synthesis, the 
three non-wild type nucleotides are incorporated into the oligonucleotides along with the wild type 
nucleotide. The non-wild type nucleotides are incorporated at a predetermined percentage, so that all 
possible variations of the sequence are produced with anticipated frequency. In this way, all possible 

50 nucleotide substitutions are made within a defined region of a gene, resulting in the production of many 
mutant proteins in which the amino acids of a defined region vary randomly (Oliphant, A.R. et al., Meth. 
Enzyrnoj. 155:568 (1987)). 

Methods of random mutagenesis, such as saturation mutagenesis, ar designed to comp nsate for the 
inability to predict where mutations should be made to yield useful information or functional mutants. The 

55 methods are based on th principle that, by generating all or a large number of th possible variants of 
rel vant protein domains, th proper arrangement of amino acids is lik ly to be produced as one of the 
randomly generated mutants. However, for completely random combinations of mutations, the numbers of 
mutants generated can overwhelm th capacity to select meaningfully. In practice, the number of random 
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mutations generated must be large enough to be likely to yield the desired mutations, but small enough so 
that the capacity of the selection system is not exceeded. This is not always possible given the size and 
complexity of most proteins. 

5 Summary of the Invention 

This invention pertains to a method of mutagenesis for the generation of novel or improved proteins (or 
polypeptides) and to libraries of mutant proteins and specific mutant proteins generated by the method. The 
protein, peptide or polypeptide targeted for mutagenesis can be a natural, synthetic or engineered protein, 

10 peptide or polypeptide or a variant (e.g., a mutant). In one embodiment, the method comprises introducing 
a predetermined amino acid into each and every position in a predefined region (or several different 
regions) of the amino acid sequence of a protein. A protein library is generated which contains mutant 
proteins having the predetermined amino acid in one or more positions in the region and, collectively, in 
every position in the region. The method can be referred to as "walk-through" mutagenesis because, in 

75 effect, a single, predetermined amino acid is substituted positlon-by-positton throughout a defined region of 
a protein. This allows for a systematic evaluation of the role of a specific amino acid in the structure or 
function of a protein. 

The library of mutant proteins can be generated by synthesizing a single mixture of oligonucleotides 
which encodes all of the designed variations of the amino acid sequence for the region containing the 

20 predetermined amino acid. This mixture of oligonucleotides is synthesized by incorporating in each 
condensation step of the synthesis both the nucleotide of the sequence to be mutagenized (for example, 
the wild type sequence) and the nucleotide required for the codon of the predetermined amino acid. Where 
a nucleotide of the sequence to be mutagenized is the same as a nucleotide for the predetermined amino 
acid, no additional nucleotide is added. In the resulting mixture, oligonucleotides which contain at least one 

25 codon for the predetermined amino acid make up from about 12.5% to 100% of the constituents. In 
addition, the mixture of oligonucleotides encodes a statistical (in some cases Gaussian) distribution of 
amino acid sequences containing the predetermined amino acid in a range of no positions to all positions in 
the sequence. 

The mixture of oligonucleotides is inserted into a gene encoding the protein to be mutagenized (such as 
30 the wild type protein) in place of the DNA encoding the region. The recombinant mutant genes are cloned 
in a suitable expression vector to provide an expression library of mutant proteins that can be screened for 
proteins that have desired properties. The library of mutant proteins produced by this oligonucleotlde- 
mediated procedure contains a larger ratio of informative mutants (those containing the predetermined 
amino acid in the defined region) relative to non informative mutants than libraries produced by methods of 
35 saturation mutagenesis. For example, preferred libraries are made up of mutants which have the predeter- 
mined amino acid in essentially each and every position in the region at a frequency ranging from about 
12.5% to 100%. 

This method of mutagenesis can be used to generate libraries of mutant proteins which are of a 
practical size for screening. The method can be used to study the role of specific amino acids in protein 
40 structure and function and to develop new or Improved proteins and polypeptides such as enzymes, 
antibodies, binding fragments or analogues thereof, single chain antibodies and catalytic antibodies. 

Brief^ Description pftheRgures 

45 Figure 1 is a schematic depiction a "walk-through" mutagenesis of the Fv region of immunoglobulin 
MCPC 603, performed for the CDR1 (Asp) and CDR3 (Ser) of the heavy(H) chain and CDR2 (His) of the 
light chain (L). 

Figure 2 is a schematic depiction of a "walk-through" mutagenesis of an enzyme active site; three 
amino acid regions of the active site are substituted in each and every position with amino acids of a 
50 serine-protease catalytic triad. 

Figure 3 illustrates the design of "degenerate" oligonucleotides for walk-through mutagenesis of the 
CDR1 (Figure 3a) and CDR3 (Figure 3b) of the heavy chain, and CDR2 (Figure 3c) of the light chain of 
MCPC 603, 

Figure 4. illustrates the design of a "window" of mutagenesis, and shows the s quences of degenerate 
55 oligonucleotides for mutation of CDR3 of the heavy chain (Figure 4a) and CDR2 of the light chain of MCPC 
603 (Figure 4b). 

Figures 5a and 5b illustrate the design of "windows" of mutagenesis and show the sequences of 
degenerate oligonucleotides for two different wlk-through mutagenesis procedures with His in CDR2 of the 
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heavy chain of MCPC 603. 

Figure 6 illustrates the design and sequences of degenerate oligonucleotides for walk-through 
mutagenesis of CDR2 of the heavy chain of MCPC 603. 

Figure 7 illustrates a "window" of mutagenesis in the HIV protease, consisting of three consecutive 
5 amino acid residues at the catalytic site. The design and sequences of degenerate oligonucleotides for 
three rounds of walk-through mutagenesis of the region with Asp, Ser and His is shown. 

Figure 8 illustrates the design and sequence of degenerate oligonucleotides for walk-through 
mutagenesis of five CDRs of MCPC 603. The degenerate oligonucleotides for walk-through mutagenesis of 
the CDR1 (Figure 8a) and CDR3 (Figure 8b) of the light chain, and of CDR 1 (Figure 8c), CDR2 (Figure 8d), 
10 and CDR3 (Figure 8e) of the heavy chain are shown. 

Detailed Description of thejnvention 

The study of proteins has revealed that certain amino acids play a crucial role in their structure and 

75 function. For example, it appears that only a discrete number of amino acids participate in the catalytic 
event of an enzyme. Serine proteases are a family of enzymes present in virtually all organisms, which 
have evolved a structurally similar catalytic site characterized by the combined presence of serine, histidine 
and aspartic acid. These amino acids form a catalytic triad which, possibly along with other determinants, 
stabilizes the transition state of the substrate. The functional role of this catalytic triad has been confirmed 

20 by individual and by multiple substitutions of serine, histidine and aspartic acid by site-directed 
mutagenesis of serine proteases and the importance of the interplay between these amino acid residues in 
catalysis is now well established. These same three amino acids are involved in the enzymatic mechanism 
of certain lipases as well. Similarly, a large number of other types of enzymes are characterized by the 
peculiar conformation of their catalytic site and the presence of certain kinds of amino acid residues in the 

25 site that are primarily responsible for the catalytic event. For an extensive review, see En zynne_ Structure 
and Mechanism, 1985, by A. Fersht, Freeman Ed., New York. 

Though it is clear that certain amino acids are critical to the mechanism of catalysis, it is difficult, if not 
impossible, to predict which position (or positionc) an amino acid must occupy to produce a functional site 
such as a catalytic site. Unfortunately, the complex spatial configuration of amino acid side chains in 

30 proteins and the interrelationship of different side chains in the catalytic pocket of enzymes are insufficiently 
understood to allow for such predictions. As pointed out above, selective (site-directed) mutagenesis and 
saturation mutagenesis are of limited utility for the study of protein structure and function in view of the 
enormous number of possible variations in complex proteins. 

The method of this invention provides a systematic and practical approach for evaluating the impor- 

35 tance of particular amino acids, and their position within a defined region of a protein, to the structure or 
function of a protein and for producing useful proteins. The method begins with the assumption that a 
certain, predetermined amino acid is important to a particular structure or function. The assumption can be 
based on a mere guess. More likely, the assumption is based upon what is known about the amino acid 
from the study of other proteins. For example, the amino acid can be one which has a role in catalysis, 

40 binding or another function. 

With selection of the predetermined amino acid, a library of mutants of the protein to be studied is 
generated by incorporating the predetermined amino acid into each and every position of the region of the 
protein. As schematically depicted in Figures 1 and 2, the amino acid is substituted in or "walked-through" 
all (or essentially all) positions of the region. 

45 The library of mutant proteins contains individual proteins which have the predetermined amino acid in 
each and every position in the region. The protein library will have a higher proportion of mutants that 
contain the predetermined amino acid in the region (relative to mutants that do not), as compared to 
libraries that would be generated by completely random mutation, such as saturation mutation. Thus, the 
desired types of mutants are concentrated in the library. This is important because it allows more and larger 

50 regions of proteins to be mutagenized by the walk-through process, while still yielding libraries of a size 
which can be screened. Further, if the initial assumption is correct and the amino acid is important to the 
structure or function of the protein, then the library will have a higher proportion of informative mutants than 
a library gen rated by random mutation. 

In another embodiment, a pred termined amino acid is introduced into each of c rtain selected 

55 positions witin a pred fined region or regions. Certain selected positions may be known or thought to be 
more promising due to structural constraints. Such considerations, based on structural information or 
modeling of the molecul mutageniz d and/or the desired structure, can be used to select a subset of 
positions within a region or r gions for mutag n sis. Thus, the amino acids mutageniz d within a region 
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need not be contiguous. Walking an amino acid through certain selected positions in a region can mininnize 
the number of variants produced. 

The size of a library will vary depending upon the length and number of regions and amino acids within 
a region that are mutagenized. Preferably, the library will be designed to contain less than 10^° mutants, 
5 and more preferably less than 10^ mutants. 

In a preferred embodiment, the library of mutant proteins is generated by synthesizing a mixture of 
oligonucleotides (a degenerate oligonucleotide) encoding selected permutations of amino acid sequences 
for the defined region of the protein. Conveniently, the mixture of oligonucleotides can be produced in a 
single synthesis. This is accomplished by incorporating, at each position within the oligonucleotide, both a 

10 nucleotide required for synthesis of the wild-type protein (or other protein to be mutagenized) and a single 
appropriate nucleotide required for a codon of the predetermined amino acid. (This differs from the 
oligonucleotides produced in saturation mutagenesis in that, for each DNA position mutagenized, only a 
single additional nucleotide, as opposed to three for "saturation", is added). The two nucleotides are 
typically, but not necessarily, used in approximately equal concentrations for the reaction so that there is an 

75 equal chance of incorporating either one into the sequence at the position. When the nucleotide of the wild 

~n type sequence and the nucleotide for the codon of the predetermined amino acid are the same, no 
additional nucleotide is incorporated. 

Depending upon the number of nucleotides that are mutated to provide a codon for a predetermined 
amino acid, the mixture of oligonucleotides will generate a limited number of new codons. For example, if 

20 only one nucleotide is mutated, the resulting DNA mixture will encode either the original codon or the codon 
of the predetermined amino acid. In this case, 50% of all oligonucleotides in the resulting mixture will 
contain the codon for the predetermined amino acid at that position. If two nucleotides are mutated in any 
combination (first and second, first and third or second and third), four different codons are possible and at 
least one will encode the predetermined amino acid, a 25% frequency. If all three bases are mutated, then 

26 the mixture will produce eight distinct codons, one of which will encode the predetermined amino acid. 
Therefore the codon will appear in the position with a minimum frequency of 12.5%. However, it is likely 
that an additional one of the eight codons would code for the same amino acid and/or a stop codon and 
accordingly, the frequency of predetermined amino acid would be greater than 12.5%. 

By this method, a mixture of oligonucleotides is produced having a high proportion of sequences 

30 containing a codon for the predetermined amino acid. Other restrictions in the synthesis can be imposed to 
increase this proportion (by reducing the number of oligonucleotides in the mixture that do not contain at 
least one codon for the predetermined amino acid). For example, when a complete codon (three 
nucleotides) must be substituted to arrive at the codon for the predetermined amino acid, the substitute 
nucleotides only may be introduced (so that the codon for the predetermined amino acid appears with 

35 100% frequency at the position). The proportions of the wild type nucleotide and the nucleotide coding for 
the preselected amino acid may be adjusted at any or all positions to influence the proportions of the 
encoded amino acids. 

In a protein library produced by this procedure, the proportion of mutants which have at least one 
residue of the predetermined amino acid in the defined region ranges from about 12.5% to 100% of all 

40 mutants in the library (assuming approximately equal proportions of wild type bases and preselected amino 
acid bases are used in the synthesis). Typically, the proportion ranges from about 25% to 50%. 

The libraries of protein mutants will contain a number equal to or smaller than 2", where n represents 
the number of nucleotides mutated within the DNA encoding the protein region. Because there can be only 
a limited number of changes for each codon (one, two or three) the number of protein mutants will range 

45 from 2"* to 8"^, where m is the number of amino acids that are mutated within that region. This represents a 
dramatic reduction compared with the 19"* mutants generated by a saturation mutagenesis. For instance, for 
a protein region of seven amino acids, the number of mutants generated by a walk-through mutagenesis (of 
one amino acid) would result in a 0.000014% to 0.24% fraction of the number of mutants that would be 
generated by saturation mutagenesis of the region, a very significant reduction. 

50 An additional, advantageous characteristic of the library generated by this method is that the proteins 
which contain the predetermined amino acid conform to a statistical distribution with respect to the number 
of residues of the amino acid in the amino acid sequence. Accordingly, the sequences range from those in 
which the predetermined amino acid does not appear at any position in the region to those in which the 
predetermined amino acid appears in every position in the region. Thus, in addition to providing a means 

55 for systematic insertion of an amino acid into a region of a protein, this method provides a way to enrich a 
region of a protein with a particular amino acid. This enrichment could lead to enhancement of an activity 
attributable to the amino acid or to entirely new activities. 
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The mixture of oligonucleotides for generation of the library can be synthesized readily by known 
methods* for DNA synth sis. The preferred method involves use of solid phase beta-cyanoethyl 
phosphoramidite chemistry. See U.S. Pat nt No. 4,725,677. For convenience, an instrument for automated 
DNA synthesis can b used containing ten reagent vessels of nucleotide synthons (reagents for DNA 
5 synthesis), four vessels containing one of the four synthons (A, T, C and G)and six vessels containing 
mixtures of two synthons (A + T. A + C, A + G, T + C. T + G and C + G). 

The wild type nucleotide sequence can be adjusted during synthesis to simplify the mixture of 
oligonucleotides and minimize the number of amino acids encoded. For example, if the wild type amino 
acid Is threonine (ACT), and the preselected amino acid is arginine (AGA or AGG), two base changes are 
10 required to encode arginine, and three amino acids are produced (e.g., AGA, Arg; AGT, Ser; ACA, ACT 
Thr). By changing the wild type nucleotide sequence to ACA or ACG, only a single base change would be 
required to encode arginine. Thus, if ACG were chosen to encode the wild type threonine instead of ACT, 
only the central base would need to be changed to G to obtain arginine, and only arginine and threonine 
would be produced at that position. Depending on the particular codon and the identity of the preselected 
75 amino acid, similar adjustments at any position of the wild type codon may reduce the number of variants 
generated. 

The mixture of oligonucleotides is inserted into a cloned gene of the protein being mutagenized in place 
of the nucleotide sequence encoding the amino acid sequence of the region to produce recombinant mutant 
genes encoding the mutant proteins. To facilitate this, the mixture of oligonucleotides can be made to 

20 contain flanking recognition sites for restriction enzymes. See Crea, R., U.S. Patent No. 4,888,286. The 
recognition sites are designed to correspond to recognition sites which either exist naturally or are 
introduced in the gene proximate to the DNA encoding the region. After conversion into double stranded 
form, the oligonucleotides are ligated into the gene by standard techniques. By means of an appropriate 
vector, the genes are introduced into a host cell suitable for expression of the mutant proteins. See e.g., 

25 Huse, W.D. et al., Sc]ence_246:1275 (1989); Viera, J. et al., Meth. Enzymol. 1_53:3 (1987). 

In fact, the degenerate oligonucleotides can be introduced into the gene by any suitable method, using 
techniques well-known in the art. In cases where the amino acid sequence of the protein to be mutagenized 
is known or where the DNA sequence is known, gene synthesis is a possible approach (see e.g., Alvarado- 
Urbina, G. et at., Biochern._Cell._Biol. 64: 548-555 (1986); Jones et al., Nature 32j: 522 (1986)). For 

30 example, partially overFapping oligonucleotides, typically about 20-60 nucleotides in length, can be de- 
signed. The internal oligonucleotides (B through G and I through 0) are phosphorylated using T4 
polynucleotide kinase to provide a 5' phosphate group. Each of the oligonucleotides can be annealed to 
their complementary partner to give a double-stranded DNA molecule with single-stranded extensions 
useful for further annealing. The annealed pairs can then be mixed together and ligated to form a full length 

35 double-stranded molecule : 

ABCDEFGH 

40 



I JKLMNOP 

45 Convenient restriction sites can be designed near the ends of the synthetic gene for cloning into a suitable 
vector. The full length molecules can be cleaved with those restriction enzymes, gel purified, electroeluted 
and ligated into a suitable vector. Convenient restriction sites can also be incorporated into the sequence of 
the synthetic gene to facilitate introduction of mutagenic cassettes. 

As an alternative to synthesizing oligonucleotides representing the full-length double-stranded gene, 

50 oligonucleotides which partially overlap at their 3' ends (i.e.. with complementary 3' ends) can be 
assembled into a gapped structure and then filled In with the Klenow fragment of DNA polymerase and 
deoxynucleotide triphosphates to make a full length double-stranded gene. Typically, the overlapping 
oligonucleotides are from 40-90 nucleotides in length. The extended oligonucleotides are then ligated using 
T4 ligase. Convenient r striction sit s can be introduced at the ends and/or Internally for cloning purposes. 

55 Following digestion with an appropriate restriction enzyme or enzymes, the gene fragm nt Is gel-purified 
and ligated into a suitable vector. Alternatively, the gene fragment could be blunt end ligated into an 
appropriate vector. 
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In these approaches, if convenient restriction sites are available (naturally or engineered) following gene 
10 assennbly. the degenerate oligonucleotides can be introduced subsequently by cloning the cassette into an 
appropriate vector. Alternatively, the degenerate oligonucleotides can be incorporated at the stage of gene 
assembly. For example, when both strands of the gene are fully chemically synthesized, overlapping and 
complementary degenerate oligonucleotides can be produced. Complementary pairs will anneal with each 
other. An example of this approach is illustrated in Example 1 . 
76 When partially overlapping oligos are used in the gene assembly, a set of degenerate nucleotides can 
^ also be directly incorporated in place of one of the oligos. The appropriate complementary strand is 
synthesized during the extension reaction from a partially complementary oligo from the other strand by 
enzymatic extension with the Klenow fragment of DNA polymerase, for example. Incorporation of the 
degenerate oligonucleotides at the stage of synthesis also simplifies cloning where more than one domain 
20 of a gene is mutagenized. 

In another approach, the gene of interest is present on a single stranded plasm id. For example, the 
gene can be cloned into an M13 phage vector or a vector with a filamentous phage origin of replication 
which allows propagation of single-stranded molecules with the use of a helper phage. The single-stranded 
template can be annealed with a set of degenerate probes. The probes can be elongated and ligated, thus 
25 incorporating each variant strand into a population of molecules which can be introduced into an appropriate 
host (Sayers, J.R. etjal., Nucleic _Acids Res. 16: 791-802 (1988)). This approach can circumvent multiple 
cloning steps where multiple domains are selected for mutagenesis. 

Polymerase chain reaction (PGR) methodology can also be used to incorporate degenerate 
oligonucleotides into a gene. For example, the degenerate oligonucleotides themselves can be used as 
30 primers for extension. 



35 



40 




In this embodiment, A and B are populations of degenerate oligonucleotides encoding the mutagenic 
45 cassettes or "windows", and the windows are complementary to each other (the zig-zag portion of the 
oligos represents the degenerate portion). A and B also contain wild type sequences complementary to the 
template on the 3' end for amplification and are thus primers for amplification capable of generating 
fragments incorporating a window. C and D are oligonucleotides which can amplify the entire gene or region 
of interest, including those with mutagenic windows incorporated (Steffan. N.H. et a!., 
50 Gene 77: 51-59 (1989)). The extension products primed from A and B can hybridize through their 
complementary windows and provide a template for production of full-length molecules using C and D as 
primers. C and D can be designed to contain convenient sites for cloning. The amplified fragments can then 
be cloned. 

Libraries of mutants generat d by any of the above techniqu s or other suitable techniques can be 
55 screen d to identify mutants of desir d structure or activity. The screening can be done by any appropriate 
m ans. For example, catalytic activity can be ascertained by suitable assays for substrate conversion and 
binding activity can be evaluated by standard Immunoassay and/or affinity chromatography. 
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The method of this invention can be us d to mutagenize any region of a protein, protein subunit or 
polypeptide. The description heretofore has centered around proteins, but it should be understood that the 
method appli s to polypeptides and multi-subunit proteins as well. Th regions mutagenlzed by the method 
of this invention can be continuous or discontinuous and will generally range in length from about 3 to about 
5 30 amino acids, typically 5 to 20 amino acids. 

Usually, the region studied will be a functional domain of the protein such as a binding or catalytic 
domain. For example, the region can be the hypervarlable region (complementarity-determining region or 
CDR) of an immunoglobulin, the catalytic site of an enzyme, or a binding domain. 

As mentioned, the amino acid chosen for the "walk through" mutagenesis is generally selected from 
10 those known or thought to be involved In the structure or function of interest. The twenty naturally occurring 
amino acids differ only with respect to their side chain. Each side chain is reponsible for chemical 
properties that make each amino acid unique. For review, see Principles of Protein Structure. 1 988, by G.E. 
Schuiz and R. M. Schirner, Springer-Verlag. 

From the chemical properties of the side chains, it appears that only a selected number of natural 
' 75 amino acids preferentially participate in a catalytic event. These amino acids belong to the group of polar 
. and neutral amino acids such as Ser, Thr. Asn, Gin, Tyr, and Cys, the group of charged amino acids, Asp 
and Glu, Lys and Arg, and especially the amino acid His. 

Typical polar and neutral side chains are those of Cys, Ser, Thr, Asn, Gin and Tyr. Gly is also 
considered to be a borderline member of this group. Ser and Thr play an important role in forming 
20 hydrogen-bonds. Thr has an additional asymmetry at the beta carbon, therefore only one of the 
stereoisomers is used. The acid amide Gin and Asn can also form hydrogen bonds, the amido groups 
functioning as hydrogen donors and the carbonyl groups functioning as acceptors. Gin has one more GHz 
group than Asn which renders the polar group more flexible and reduces its interaction with the main chain. 
Tyr has a very polar hydroxyl group (phenolic OH) that can dissociate at high pH values. Tyr behaves 
25 somewhat like a charged side chain; its hydrogen bonds are rather strong. 

Neutral polar acids are found at the surface as well as inside protein molecules. As internal residues, 
they usually form hydrogen bonds with each other or with the polypeptide backbone. Cys can form disulfide 
bridges. 

Histidine (His) has a heterocyclic aromatic side chain with a pK value of 6.0. In the physiological pH 
30 range, its imidazole ring can be either uncharged or charged, after taking up a hydrogen ion from the 
solution. Since these two states are readily available, His is quite suitable for catalyzing chemical reactions. 
It is found in most of the active centers of enzymes. 

Asp and Glu are negatively charged at physiological pH. Because of their short side chain, the carboxyl 
group of Asp is rather rigid with respect to the main chain. This may be the reason why the carboxyl group 
35 in many catalytic sites is provided by Asp and not by Glu. Charged acids are generally found at the surface 
of a protein. 

In addition, Lys and Arg are found at the surface. They have long and flexible side chains. Wobbling in 
the surrounding solution, they increase the solubility of the protein globule. In several cases, Lys and Arg 
take part in forming internal salt bridges or they help in catalysis. Because of their exposure at the surface 
40 of the proteins, Lys is a residue more frequently attacked by enzymes which either modify the side chain or 
cleave the peptide chain at the carbonyl end of Lys residues. 

For the purpose of introducing catalytically important amino acids into a region, the invention preferen- 
tially relates to a mutagenesis in which the predetermined amino acid Is one of the following group of amino 
acids: Ser, Thr, Asn, Gin, Tyr. Cys, His, Glu, Asp, Lys, and Arg. However, for the purpose of altering binding 
45 or creating new binding affinities, any of the twenty naturally occurring amino acids can be selected. 

Importantly, several different regions or domains of a protein can be mutagenlzed simultaneously. The 
same or a different amino acid can be "walked-through" each region. This enables the evaluation of amino 
acid substitutions in conformationally related regions such as the regions which, upon folding of the protein, 
are associated to make up a functional site such as the catalytic site of an enzyme or the binding site of an 
50 antibody. This method provides a way to create modified or completely new catalytic sites. As depicted in 
Rgure 1. the six hypervariable regions of an immunoglobulin, which make up the unique aspects of the 
antigen binding site (Fv region), can be mutagenlzed simultaneously, or separately within the Vh or Vl 
chains, to study the thre dimensional int rr lationship of selected amino acids in this site. 

Th method of this invention opens up new possibilities for th design of many different typ s of 
55 proteins. Th m thod can be us d to improve upon an existing structure or function of a protein. For 
xample. th introduction of additional "catalytically important" amino acids into a catalytic domain of an 
nzym may r suit in enhanced catalytic activity toward the same substrate. Alternatively, entirely new 
structures, sp cificities or activities may b introduced into a protein, De novo synthesis of nzymatic 
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activity can be achieved as well. The new structures can be built on the natural "scaffold" of an existing 
proteinrby mutating only relevant regions by the method of this invention. 

The method of this invention is especially useful for modifying antibody molecules. As used herein, 
antibody molecules or antibodies refers to antibodies or portions thereof, such as full-length antibodies, Fv 

5 molecules, or other antibody fragments, individual chains or fragments thereof (e.g., a single chain of Fv), 
single chain antibodies, and chimeric antibodies. Alterations can be introduced into the variable region 
and/or into the framework (constant) region of an antibody. Modification of the variable region can produce 
antibodies with better antigen binding properties, and catalytic properties. Modification of the framework 
region could lead to the improvement of chemo-physical properties, such as solubility or stability, which 

10 would be useful, for example, in commercial production. Typically, the mutagenesis will target the Fv region 
of the immunoglobulin molecule - the structure responsible for antigen-binding activity which is made up of 
variable regions of two chains, one from the heavy chain (Vh) and one from the light chain (VJ. 

The method of this invention Is suited to the design of catalytic proteins, particularly catalytic 
antibodies. Presently, catalytic antibodies can be prepared by an adaptation of standard somatic cell fusion 

75 techniques. In this process, an animal is immunized with an antigen that resembles the transition state of 
the desired substrate to induce production of an antibody that binds the transition state and catalyzes the 
reaction. Antibody-producing cells are harvested from the animal and fused with an immortalizing cell to 
produce hybrid cells. These cells are then screened for secretion of an antibody that catalyzes the reaction. 
This process is dependent upon the availability of analogues of the transition state of a substrate. The 

20 process may be limited because such analogues are likely to be difficult to identify or synthesize in most 
cases. 

The method of this invention provides a different approach which eliminates the need for a transition 
state analogue. By the method of this invention, an antibody can be made catalytic by the introduction of 
suitable amino acids into the binding site of an immunoglobulin (Fv region). The antigen-binding site (Fv) 

25 region is made-up of six hypervariable (CDR) loops, three derived from the immunoglobulin heavy chain (H) 
and three from the light chain (L), which connect beta strands within each subunlt. The amino acid residues 
of the CDR loops contribute almost entirely to the binding characteristics of each specific monoclonal 
antibody. For instance, catalytic triads modeled after serine proteases can be created in the hypervariable 
segments of the Fv region of an antibody and screened for proteolytic activity. 

30 The method of this invention can be used to produce many different enzymes or catalytic antibodies, 
including oxidoreductases. transferases, hydrolases, lyases, isomerases and ligases. Among these classes, 
of particular importance will be the production of improved proteases, carboh yd rases, lipases, dioxygenases 
and peroxidases. These and other enzymes that can be prepared by the method of this invention have 
important commercial applications for enzymatic conversions in health care, cosmetics, foods, brewing, 

35 detergents, environment (e.g.. wastewater treatment), agriculture, tanning, textiles, and other chemical 
processes. These include, but are not limited to, diagnostic and therapeutic applications, conversions of 
fats, carbohydrates and protein, degradation of organic pollutants and synthesis of chemicals. For example, 
therapeutically effective proteases with fibrinolytic activity, or activity against viral structures necessary for 
infectivity. such as viral coat proteins, could be engineered. Such proteases could be useful anti-thrombotic 

40 agents or anti-viral agents against viruses such as AIDS, rhinoviruses, influenza, or hepatitis. In the case of 
oxygenases (e.g., dioxygenases), a class of enzymes requiring a co-factor for oxidation of aromatic rings 
and other double bonds, industrial applications in biopulping processes, conversion of biomass into fuels or 
other chemicals, conversion of waste water contaminants, bioprocessing of coal, and detoxification of 
hazardous organic compounds are possible applications of novel proteins. 

45 Assays for these activities can be designed in which a cell requires the desired activity for growth. For 
example, in screening for activites that degrade toxic compounds, the incorportation of lethal levels of the 
the toxic compound into nutrient plates would permit the growth only of cells expressing an activity which 
degrades the toxic compound (Wasserfallen, A.. Rekik, M., and Harayama, S., Biotechnology 9: 296-298 
(1991)). Alternatively, in screening for an enzyme that uses a non-toxic substrate, it is possible to use that 

50 substrate as the sole carbon source or sole source of another appropriate nutrient. In this case also, only 
cells expressing the enzyme activity will grow on the plates. In these methods. It is not necessary that the 
enzyme activity be secreted if the substrate or a product of the substrate (converted extracellularly by 
another activity) can be taken up by the cell. In addition, one can test directly for a novel function by 
incorporating a substrate into the medium which when acted upon leads to a visual indication of activity, 

55 
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INustrations ot_ Walk-through Mutagenes[s 
MqdeM 

5 To further illustrate the invention, a "waik-through" mutagenesis of three of the hypervariable regions or 
complemetarity determining regions (CDRs) of the monoclonal antibody MCPC 603 is described. CDR1 and 
CDR3 of the heavy chain (VH) and CDR2 of the light chain region (VL) were the domains selected for walk- 
through mutagenesis. For this embodiment, the amino acids selected are the three residues of the catalytic 
triad of serine proteases, Asp. His and Ser. Asp was selected for VH CDR1 . Ser was selected for VH CDRS. 

10 and His was selected for VL CDR2. 

MCPC 603 is a monoclonal antibody that binds phosphorylcholine. This immunoglobulin is recognized 
as a good model for investigating binding and catalysis because the protein and its binding region have 
been well characterized structurally. The CDRs for the MCPC 603 antibody have been identified. In the 
heavy chain, CDRI spans amino acids 31-35, CDR2 spans 50-69, and CDR3 spans 101-111. In the light 

J5 chain, the amino acids of CDRI are 24-40, CDR2 spans amino acids 55-62, and CDR3 spans amino acids 
.95-103. The amino acid numbers in the Figures correspond to the numbers of the amino acids in the parent 
MCPC 603 molecule. 

The cDNA corresponding to an immunoglobulin variable region can be directly cloned and sequenced 
without constructing cDNA libraries. Because immunoglobulin variable regions genes are flanked by 

20 conserved sequences, a polymerase chain reaction (PGR) can be used to amplify, clone and sequence both 
the light and heavy chain genes from a small number of hybridoma cells with the use of consensus 5* and 
3' primers. See Chiang, Y.L et aL, BioTechniques 7:360 (1989). Furthermore, the DNA coding for the amino 
acids flanking the CDR regions can be mutagenized by site directed mutagenesis to generate restriction 
enzyme recognition sites useful for further "cassette" mutagenesis. See U.S. Patent No. 4.888,286, supra. 

25 To facilitate insertion of the degenerate oligonucleotides, the mixture is synthesized to contain flanking 
recognition sites for the same restriction enzymes. The degenerate mixture can be first converted into 
double stranded DNA by enzymatic methods (Oliphant, A.R. et aL, Gene 44:177 (1986)) and then inserted 
into the gene of the region to be mutagenized in place of the CDR nucleotide sequence encoding the 
naturally-occurring (wild type) amino acid sequence. 

30 Alternatively, one of the other approaches described above, such as a gene synthesis approach, could 
be used to make a library of plasmids encoding variants in the desired regions. The published amino acid 
sequence of the MCPC 603 VH and VL regions can be converted to a DNA sequence. (Rudikoff, S. and 
Potter, M., Biochemistry 13: 4033 (1974)). Note that the wild type DNA sequence of MCPC 603 has also 
been published (Pluckthun. A. et al., CoJd_ Spring^ Harbojr Symp. Quant^ Vol. Lll: 105-112 (1987)). 

35 Restriction sites can be incorporated into the sequence to facilitate introduction of degenerate 
oligonucleotides or the degenerate sequences may be introduced at the stage of gene assembly. 

The design of the oligonucleotides for walk-through mutagenesis in the CDRs of MCPC 603 is shown in 
Rgure 3. In each case, the positions or "windows" to be mutagenized are shown. It is understood that the 
oligonucleotide synthesized can be larger than the window shown to facilitate insertion into the target 

40 construct. The mixture of oligonucleotides corresponding to the VH CDRI is designed in which each amino 
acid of the wild type sequence is substituted by Asp (Figure 3a). Two codons specify asp (GAC and GAT). 
The first codon of CDRI does not require any substitution. The second codon (TTC. Phe) requires 
substitution at the first (T to G) and second position (T to A) in order to convert it into a codon for Asp. The 
third codon (TAG. Tyr) requires only one substitution at the first position (T to G). The fourth codon (ATG, 

45 Met) requires three substitutions, the first being A to G, the second T to A and the third G to T. The fifth 
codon (GAG. Glu) requires only one substitution at the third position (G to T). The resulting mixture of 
oligonucleotides is depicted below. 

50 TTTATGG 

5'-GAC C AC GA -3' 

G A G GAT T 

55 

This r presents a mixture of 2^ = 1 28 different oligonucleotide sequenc s. 

From the genetic code, it is possible to deduce all the amino acids that will substitute the original amino 
acid in each position. For this cas . the first amino acid will always be Asp (100%), the s cond will be Phe 
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(25%). Asp (25%). Tyr (25%) or Val (25%), the third amino acid will be Tyr (50%) or Asp (50%); the fourth 
will be Met (12,5%). Asp (12.5%). Val (25%). Glu (12.5%). Asn (12.5%). lie (12.5%) or Lys (12.5%); and the 
fifth codon will be either Glu (50%) or Asp (50%). In total, 128 oligonucleotides which will code for 112 
different protein sequences (1 x 4 x 2 x 7 x 2 = 112) are generated. Among the 112 different amino acid 
5 sequences generated will be the wild type sequence (which has an Asp residue at position 31). and 
sequences differing from wild type ir» that they contain from one to four Asp residues at positions 32-35, in 
all possible permutations (see Figure 3a). In addition, some sequences, either with or without Asp 
substitutions, will contain an amino acid - neither wild type nor Asp - at positions 32. 34 or both. These 
amino acids are introduced by permutations of the nucleotides which encode the wild type amino acid and 

10 the preselected amino acid. For example, in Figure 3a, at position 32, tyrosine (Tyr) and valine (Val) are 
generated in addition to the wild type phenylalanine (Phe) residue and the preselected Asp residue. 

The CDR3 of the VH region of MCPC603 Is made up of 11 amino acids, as shown in Figure 3b. A 
mixture of oligonucleotides is designed in which each non-serine amino acid of the wild type sequence is 
replaced by serine (Ser). as described above for CDR1 . Six codons (TCX and AGO, AGT) specify Ser. The 

15 substitutions required throughout the wild-type sequence amount to 12. As a result, the oligonucleotide 
, mixture produced contains 2^^ - 4095 different oligonucleotides which, in this case, will code for 4096 
protein sequences. Among these sequences will be some containing a single serine residue (in addition to 
the serine 105) in any one of the other positions (101-104, 106-111), as well as variants with more than one 
serine, in any combination (see Figure 3b). 

20 The CDR2 of the VL region of MCPC603 contains eight amino acids (56-63). Seven of these amino 
acids (56-62) were selected for walk-through mutagenesis as depicted in Figure 3c. The mixture of 
oligonucleotides is designed in which each amino acid of the wild type sequence will be replaced by 
histidine (His). Two codons (CAT and CAC) specify His. The substitutions required throughout the wild-type 
DNA sequence total 13. Thus, the oligonucleotide mixture produced contains 2^^ = 8192 oligonucleotides 

25 which specify 8192 different peptide sequences (see Figure 3c). 

As result of this mutagenesis method, by the synthesis and the use of three oligonucleotide mixtures, a 
library of Fv sequences can be produced which contains 112 x 4096 x 8192 = 3.76 x 10^ different protein 
sequences. A significant proportion of these sequences will encode the amino acid triad His. Ser, Asp 
typical of serine proteases within the hypervariable regions. 

30 The synthesis of the degenerate mixture of oligonucleotides can be conveniently obtained in an 
automated DNA synthesizer programmed to deliver either one nucleotide to the reaction chamber or a 
mixture of two nucleotides in equal ratio, mixed prior to the delivery to reaction chamber. An alternative 
synthetic procedure would involve premixing two different nucleotides in a reagent vessel. A total of 10 
reagent vessels, four of which containing the individual bases and the remaining 6 containing all of the 

35 possible two base mixtures among the 4 bases, can be employed to synthesize any mixture of 
oligonucleotides for this mutagenesis process. For example, the DNA synthesizer can be designed to 
contain the following ten chambers: 



Chamber 


Synthon 


1 


A 


2 


T 


3 


C 


4 


G 


5 


(A + T) 


6 


(A + C) 


7 


(A + 6) 


8 


(T + C) 


9 


(T + G) 


10 


(C + G) 



With this arrangement, any nucleotide can be replaced by a combination of two nucleotides at any position 
of the sequence. 

The following sequence of reactions is required to synthesize the desired mixture of degenerate 
oligonucleotides for: 

VH CDR1: 4. 1.3, 9, 5. 3. 9, 1. 3. 7. 5, 9, 4. 1, 9 

VH CDR3: 1. 7. 3, 2. 6, 3. 2. 6, 3, 7. 4, 3, 1, 4, 3, 1. 10. 2. 2. 10. 4. 2, 6, 3. 2. 8. 3, 9, 6. 3. 9, 8. 2 
VL CDR2: 10. 7. 2. 10, 6. 2, 6, 7. 3, 6. 6, 3. 3. 7, 2. 10, 1, 5, 8. 6, 2 
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As an alternative to this procedure, if nnlxing of individual bases in the lines of the oligonucleotide 
synthesizer is possible, the machine can be programmed to draw from two or more reservoirs of pure 
bases to generate the desired proportion of nucleotides. 

Each mixture of synthetic oligonucleotides can be inserted into the gene for the respective MCPC 603 
5 variable region. The oligonucleotides can be converted into double-stranded chains by enzymatic tech- 
niques (see e.g.. Oliphant. A.R. et al., 1986, supra) and then ligated into a restricted plasmid containing the 
gene coding for the protein to be mutagenized. The restriction sites could be naturally occurring sites or 
engineered restriction sites. 

The mutant MCPC 603 genes constructed by these or other suitable procedures described above can 
10 be expressed in a convenient E. coll expression system, such as that described by Pluckthun and Skerra. 
(Pluckthun, A. and Skerra. A., Meth.~Enz_ymol. 178: 476-515 (1989); Skerra. A. et al.. Biotechnology 9: 273- 
278 (1991)). The mutant proteins can be expressed for secretion in the medium and/or in the cytoplasm of 
the bacteria, as described by M. Better and A. Horwitz. Meth._Enzymol. 178:476 (1989). 

These and other Fv variants, or antibody variants produced by the present method can also be 
15 produced in other microorganisms such as yeast, or in mammalian cells, such as myeloma or hybridoma 
^ .cells. The Fv variants can be produced as individual VH and VL fragments, as single chains (see Huston, 
J.S. et al., Proc. Natl. Acad. ScL USA 85: 5879-5883 (1988)), as parts of larger molecules such as Fab, or 
as entire antibody molecules. 

In a preferred embodiment, the single domains encoding VH and VL are each attached to the 3' end of 
20 a sequence encoding a signal sequence, such as the ompA, phoA or pelB signal sequence (Lei, S.P. et al., 
J. Bacteriol. 169: 4379 (1987)). These gene fusions are assembled in a dicistronic construct, so that they 
can be expressed from a single vector, and secreted into the periplasmic space of E, coli where they will 
refold and can be recovered in active form. (Skerra, A. et al., Biotechnology 9: 273-278 (1991)). The mutant 
VH genes can be concurrently expressed with wild-type VL to produce Fv variants, or as described, with 
25 mutagenized VL to further increase the number and structural variety of the protein mutants. 

Screening of these variants for acquisition of a proteolytic function can be accomplished in an as say as 
described below for the HIV protease variants (see also Example 4). Note also that since the catalytic triad 
of Asp-His-Ser has also been implicated in the mechanism of certain lipases, variants with lipase function 
may also be generated. 

30 

MqdelJI 

In a second model designed to generate a serine protease in the MCPC 603 Fv structure. Asp is 
selected for VH CDR1, His for VH CDR3. and Ser for VL CDR2. In this case, the degenerate 
35 oligonucleotides designed for the VH CDR1 Asp walk-through from model 1 can be reused, illustrating the 
interchangeable nature of the walk-through cassettes (Figure 3a). 

For the His walk-through of VH CDR3, His the nucleotides required to specify histidine codons are 
introduced from positions 101-111 of the VH region. Figure 4a illustrates this walk-through procedure. Note 
that in this and other examples, the percentages of His produced are calculated for the case where 
40 approximately equal proportions of the wild-type or His nucleotide are introduced. These proportions can be 
adjusted to Influence the frequency with which various amino acids are produced. 

Figure 4b illustrates the Ser walk-through of VL CDR2 in each position (55-62). Here, the sequence at 
positions 58 and 62 is unchanged as serine is present in the wild type sequence. Note that at position 61 , 
although four different nucleotide sequences are generated, only three different protein sequences would be 
45 produced. This outcome is due to the fact that TAA codes for a stop cod on. 

Application of the method in this case can produce a library of Fv sequences which contains 112 x 
196,608 X 96 = 2.11 x 10^ different protein sequences. Again, a significant proportion of these sequences 
will encode the catalytic Asp-Hls-Ser triad in the hypervariable regions. 

Note that once a series of cassettes for a number of regions is designed, the series may be used In 
50 any permutation desired. For example, degenerate oligonucleotides may be designed for the CDRs, and 
these may be used together in any combination of regions and chains desired, as well as In different 
structures (e.g., single VL or VH chains, Fv molecules, single chain antibodies, full-size antibodies or 
chim ric antibodies). 

55 ModelJIj 

In another approach to the design of a serine protease, only the heavy chain of the Fv molecule is 
used. Monomeric VH domains, known as single domain antibodies, with good antigen-binding affinities have 
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been prepared (Ward, E.S. et_al„ Nature 341: 544-546 (1989)). Thus, a single VH chain can provide a 
scaffold for walk-through mutagenesis. For this model, Asp was selected for VH CDR1 (Figure 3a), His for 
VH CDR2 and Ser for VH CDR3 (Figure 3b). Again, two of the degenerate nucleotide sequences described 
in Model I can be reused (Figures 3a and 3b). Figure 5a shows the His walk-through in a portion of VH 
5 CDR2. 

Oligonucleotides comprising the windows shown in Figures 3a, 3b and Figure 5a and degenerate 
oligonucleotides complementary to these windows have been made. Furthermore, using complementary 
oligonucleotides, in addition to the degenerate oligonucleotides and their complements, a full length double- 
stranded VH gene variant was assembled. The assembled gene variants have been cloned into the vector 
10 pRB500 (Example 2), which contains the pelB leader sequence for secretion. These experiments are 
described in Example 1. 

Synthesis of these oligonucleotides and incorporation into the VH gene as described, in all possible 
combinations, can theoretically generate ^^2x2^ x 4096 = 1.54 x 10^^ different peptide sequences. Due 
to the length of the region targeted in VH CDR2, a large number of variants are generated; however, a large 

75 proportion of the variants will have the preselected amino acids. 

As an alternative to using the VH CDR2 window shown in Figure 5a, another window encompassing a 
different portion of VH CDR2 was designed (Figure 5b). In this window, certain positions in the region were 
selected (see Model VI below for further explanation) and subjected to walk-through mutagenesis using His 
as the preselected amino acid. If oligonucleotides designed as shown in Figure 5b are used instead of the 

20 oligonucleotides of Figure 5a, 112 x 128 x 4096 = 5.87 x 10^ different peptide sequences can be 
generated. 

MqdelJV 

25 In another embodiment using the heavy chain of the Fv molecule, a different combination of windows is 
used. The Asp window previously described for CDR1 (Figure 3b; Models I, III) and the .His window 
previously described for CDR3 (Figure 4a; Model II) are used with a new window in which Ser is walked 
through the amino-terminal portion of VH CDR2 from amino acids 50-60. This walk-through mutagenesis is 
illustrated in Figure 6. 

30 Synthesis of these oligonucleotides and incorporation into the VH gene in all possible combinations can 
generate 112 x 4096 x 196,608 - 9.02 x 10^° different peptide sequences. 

Model y 

35 In another embodiment, a protein with an existing catalytic activity is altered to generate a different 
mechanism of catalysis. In the process, the specificity and/or activity of the enzyme may also altered. The 
HIV protease was selected as an enzyme for mutagenesis. The HIV protease is an aspartic protease and 
has an Asp-Thr-Gly sequence typical of aspartic proteases which contain a conserved Asp-Thr(Ser)-Gly 
sequence at the active site (Toh et_al., EMBO J. 4: 1267-1272 (1985)). For walk-through mutagenesis, the 

40 Asp-Thr-Gly sequence in the protease was selected as a target for mutagenesis. Walk-through mutagenesis 
was repeated three different times with three preselected amino acids, Asp, His and Ser. This approach is 
intended to result in the conversion of an aspartic protease into a serine protease and an alteration of the 
mechanism of catalysis. In addition, mutants of the HIV aspartic protease with altered activity, specificity, or 
an altered mechanism of catalysis are expected, altered 

45 Figure 7 shows the three residues or window to be altered and illustrates three sequential walk-through 
procedures with Asp. His and Ser. At the first position, which is an Asp residue, only His and Ser are 
introduced. At the two remaining positions, Asp, His, and Ser are each introduced. Note that in the second 
position of the second codon and in the second position of the third codon, the A required in the His walk- 
through has already been introduced in the Asp walk-through (Figure 7). The sequence of the mixed probe 

50 which includes 324 different sequences and the encoded amino acids are also shown in Figure 7. This 
mutagenesis protocol will generate 324 different peptide sequences in the active site window. 

For mutagenesis and expression of the HIV protease, pi asm id pRB505 was constructed as described in 
Example 2. This plasmid will direct expression of the HIV protease from an Inducible tac promoter (de Boer, 
H.A. et aL, Proc. NatL Acad. Sci. USA 80: 21 (1983)). In pRB505, the protease gene sequence is fused in 

55 frame to th 3' end of a sequence encoding the pelB leader sequence of p ctate lyase, so that the protease 
can be secreted Into the periplasmic space of E._coli. The construct Is designed so that the leader 
sequence Is cl aved and the naturally occurring N-terminal sequenc of the protease is generated. 
Secretion of the HIV protease will facilitate assaying and purification of variants generated by mutagenesis. 
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The complement of the mixed probe shown in Rgure 7 was synthesized, and a partially complementary 
oligonucleotide was also synthesized. These oligonucleotides are designed to allow production of a double- 
stranded sequence with convenient Xhol (CTCGAG) and BstEJI (GGTNACC) restriction sites (underlined) 
flanking the active site window. (Note that the complement of 'the active site window's coding sequence was 
5 synthesized. Thus, the nucleotide sequence for the wild type for the active site window (5'-ACC AGT GTC- 
3') shown below is the complement of 5'- GAG ACT GGT -3*. the latter which codes for Asp-Thr-Gly.) 

G TC G 
TT CG GA 

5'- CAT TTC CTC_GAG AAC GGT GTC ATC AGC ACC AGT GTC - 

WINDOW" 

15 

CAG CAG AGC TTC CTT TAG TTG ACC ACC GAT TTT GAT GGT- 

3 ' -TA AAA CTA CCA 

20 

AAC CAG TGG - 3' 

TTG_GTC ACC TGC GAC GGT GTC TCA CTA AAC G- 5' 

25 The oligonucleotides were annealed and extended in a reaction using the Klenow fragment of DNA 
polymerase. Extension of the short complementary oligonucleotide generates the complement of each of 
the variant oligonucleotides. The reaction mix was digested with BstEII and Xhol and the products were 
separated on an 8% polyacrylamide gel, A 106 bp band was recovered from the gel by electroelution. This 
band, containing the active site window fragments, was cloned between the BstEII and Xhol sites of 

30 pRB505, and the tigated plasmids were introduced into a TG1/pACYC177 lad'' strain. The resulting 
transformants were plated on LB amp plates, and yielded about 1000 colonies. 

The colonies were screened using the protease screening assay described in Example 4. Ampicillin 
resistant colonies were screened for proteolytic activity by replica plating onto nutrient agar plates 
containing 2 mM IPTG for induction of expression, and either dry milk powder (3%) or hemoglobin as a 

35 protease substrate as described in Example 4. In this assay, if a colony secretes proteolytic activity leading 
to degradation of the substrate in the plate (e.g.. dry milk), a zone of clearing appears against the opaque 
background of the plate. Because the wildtype HIV protease does not show activity in the assay (due to its 
substrate specificity), novel activities can be distinguished from the original activity. Preliminary data 
indicate that transformants with novel activity can be generated by the described procedure. 

40 The novel variants generated can be screened further for acquisition of a different mechanism of action 
by differential inhibition with protease inhibitors. For example, serine proteases are inhibited by PMSF 
(phenylmethylsulfonyl fluoride). DPP (diisopropylphosphofluoridate). TLCK (L-1-chloro3-(-9-tosylamide)-7- 
amino-2-heptanone-hydrochloride).Transformants which generate a halo on plates can be grown in liquid 
media, and extracts from the cultures can be assayed in the presence of the appropriate inhibitors. 

45 Reduced activity in the presence of a serine protease inhibitor as compared to activity in the absence of 
such an inhibitor will be indicative that a variant functions with a serine protease catalytic mechanism. 
Among the variants generated by the walk-through mutagenesis procedure will be variants with altered 
activity, altered specificity, a serine protease mechanism or a combination of these features. These variants 
can be further characterized using known techniques. 

50 

MqdelVJ 

In this embodiment, walk-through mutagenesis of five out of six CDRs of the MCPC 603 Fv molecule is 
performed, and Asp. His and Ser are the preselected amino acids. In this model, "walk-through" 
55 mutagenesis is carried out from two to thr e times with a different amino acid in a given region or domain. 
For example. Ser and His are sequentially walked-through VL CDR1 (Rgure 8a). and Asp and S r are 
s quentially walked-through VL CDR3 (Rgur 8b). VL CDR2 was not target d for mutagenesis b cause 
structural studies indicated that this region contributes little to the binding sit in MCPC 603. 
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In CDR1 of the VH chain of the Fv, Asp and His are walked through (Figure 8c). Ser can be introduced 
at two positions in CDR1 with a single base change (Figure 8c, positions 32 and 33). In VH CDR2. His and 
Ser are the preselected amino acids used (Figure Bd) and in VH CDR3, Asp, His and Ser are each walked 
through the amino terminal five positions of CDR3 (Figure 8e). 

5 Furthermore, in this embodiment not all amino acids in a given region are mutagenized, although they 
do not contain the preselected amino acid as the wild type residue. For example, in Figure 8d, only 
positions 50, 52, 56, 58 and 60 are mutagenized. Similarly, in Figures 8a-d, It can be seen that one or more 
residues in the region are not mutagenized. Mutagenesis of noncontiguous residues within a region can be 
desirable if It is known, or if one can guess, that certain residues in the region will not participate in the 

10 desired function. In addition, the number of variants can be minimized. 

For example, in the case of a serine protease, a design factor is the distance between the the 
preselected amino acids. In order to form a catalytic triad, the residues must be able to hydrogen bond with 
one another. This consideration can impose a proximity constraint on the variants generated. Thus, only 
certain positions within the CDRs may permit the amino acids of the catalytic triad to interact properly. 

15 Thus, molecular modeling or other structural information can be used to enrich for functional variants. 
. .... ...^^ In this case, known structural information was used to identify residues in the regions that may be close 

enough to permit hydrogen bonding between Asp, His and Ser, as well as the range of residues to be 
mutagenized. Roberts et al. have identified regions of close contact between portions of the CDRs (Roberts, 
V,A. et al., Proc^ Natl. Acad. Sci. USA 87: 6654-6658 (1990)). This information together with data from the x- 

20 ray structure of MCPC 603 were used to select promising areas of close contact among the CDRs targeted 
for mutagenesis. 

If the mutagensis is carried out as illustrated and the regions are randomly combined, then 17,280 x 
27.648 X 432 x 2304 x 7776 = 5.2 x 10^^ different peptide sequences can be generated. 

25 MqdelVJI 

In each of the embodiments described above, mutagenesis is designed to create clusters of catalyt- 
ically active residues. In the embodiment of Model VII, mutagenesis is designed to create a novel binding 
function. In this embodiment, residues implicated in the binding or chelating of a co-factor (e.g., Fe + + + ) 
30 are introduced into regions of a molecule, in this case MCPC 603. Many enzymes use metal ions as 
cofactors, so it is desirable to generate such binding sites as a first step towards engineering such 
enzymes. 

In this embodiment two histidine and two tyrosine residues are introduced into the CDRs of MCPC 603. 

Dioxygenases, which are members of the class of oxidoreductases, and which catalyze the oxidative 
35 cleavage of double bonds in catachols contain a bound iron at their active sites. Spectroscopic analysis and 

X-ray crystallography indicate that the ferric ion at the active site of the dioxygenases is bound by two 

tyrosine and two histidine residues. 

The histidine windows designed for MCPC 603 (see e.g., Figure 3c, VL CDR2: Figure 4a, VH CDR3; 

and Figure 5a, VH CDR2) can be used to introduce histidine residues into one or more domains of MCPC 
40 603 or additional windows can be designed. Similarly, the one or more CDRs of MCPC 603 can be targeted 

for walk-through mutagenesis with tyrosine. Using these cassettes, variants with 2 histidine and 2 tyrosine 

residues in a large variety of combinations and in different regions can be produced. 

These variants can be screened for acquisition of metal binding. For example, pools of colonies can be 

grown and a periplasmic fraction can be prepared. The proteins in a the periplasmic fraction of a given pool 
45 can be labeled with an appropriate radioactive metal ion (e.g.. ^^Fe) and the presence of a metal binding 

variant can be determined using high sensitivity gel filtration. The presence of radioactivity in the protein 

fraction from gel filtration is indicative of metal binding. Pools can be subdivided and the process repeated 

until a mutant is isolated. 

Alternatively, a nitrocellulose filter assay can be used. Colonies of a strain which secretes the mutant 
50 proteins and which allows the proteins to leak into the medium can be grown on nitrocellulose filters. The 
mutant proteins leaking from the colonies can bind to the nitrocellulose and the presence of metal binding 
proteins can be ascertained by probing with radiolabeled metal ions. 

Generation of a metal binding in the VL chain could provide a metal binding site for a catalytic VH 
chain. Production of Fv from these component chains could allow enhancement of catalysis mediated by 
55 one chain by co-factor binding in the other chain. 

The present invention is further illustrated in the following examples. 
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Example 1 

Construction of a VH Variant 

5 Oligonucleotide Synthesis 

^-cyanoethyl phosphoramidites and polymer support (CPG) columns were purchased form Applied 
Blosystems, Inc. (Foster City, CA). Anhydrous acetonitrile was purchased form Burdick and Jackson (Part 
no. 015-4). Oligonucleotides were synthesized on an Applied Biosystems Model 392 using programs 
10 provided by the manufacturer (Sinha, N.D., et al., Nucleic Acids Res., \2: 4539 (1984)). On completion of 
the synthesis, the oligonucleotide was freed from the support and the protecting cyanoethyl groups were 
removed by incubation in concentrated NH^OH. Following electrophoresis on a 10% polyacrylamide gel, 
oligomers were excised from the gel, electroeluted, purified on C18 columns, freeze dried and dissolved in 
the appropriate buffer at a final concentration of 1 ug/ml. 

15 

, Oli9on_ucleotides 

In order to construct the VH variant described in Model III. the following oligonucleotides and their 
complements (also shown), ranging in length from 30-54 bases were designed and synthesized as 
20 described. Codon utilization was adjusted to reflect the most frequently used E. coli codons. 

A/a: _910372/9J 0373 

5'- AAG AAT TCC ATG GAA GIT AAA CTG GTA GAG -3* 

25 

5'- ACC ACC AGA CTC TAG GAG TTT AAG TTG GAT GGA ATT- 
GTT- 3' 

30 

B/b:_910_374_/9J03_75 

35 5'- TCT GGT GGT GGT GTG GTA GAG CGG GGT GGA TCC- 

CTG- 3' 



5'- AGA GAG ACG GAG GGA TCC ACC CGG GTG TAG GAG - 
ACC -3' 

45 

C/c: 91 0376/91 0377 

5'- CGI CTG TCT TGC GGT ACC TCA GGT TTG -3' 
5'- AGA GAA GGT GAA ACC TGA GGT AGC GGA -3' 
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D/d:_91 0378/91 0379 

GA G GAT T 
5' -AGO TTC TCT GAC TTC TAG ATG GAG TGG GTA CGT- 
CAG-3' 



A ATC C TC 
5'-ACC CGG GGG CTG ACG TAG CCA CTC CAT GTA GAA- 
GTC -3' 

E/e: 91 0380/91 q381_ 

5'- CCC CGG GGT AAA CGT CTC GAG TGG ATC GCA GCT- 
AGC- 3' 

5'- GIT ACG GCT AGO TGG GAT CCA CTC GAG ACG TTT -3' 
FJt 91_038_2/91_0383 

CA C . C T C CA CA C T C CA 
5' -CGT AAC AAA GGT AAC AAG TAT ACT ACT GAA TAG AGC 

CA CA CA C C CA 

GCT TCT GTT AAA GGT CGT -3' 

TG G G TG TG TG TG GAG TG 
5' -GAT GAA ACG AGC TTT AAC AGA AGC GCT GTA TTC ACT 

TG GAG G TG 
ACT ATA GTT GTT ACG TTT -3' 



17 



EP 0 527 809 B1 



G/g:_91 0384/91 0385 

5'- TTC ATC GTT TCT CGT GAG ACT AGT CAA TCG ATC CTG 
TAG CTG- 3' 

5'- ATT CAT CTG CAG GTA GAG GAT CGA TTG ACT AGT GTC 
ACG AGA AAC- 3' 



H/h:_91 0386/91 0387 

5'- CAG ATG AAT GCA TTG CGT GCT GAA GAG ACC GCT ATC 
TAG- 3' 

5'-GGC GCA GTA GTA GAT AGC GGT GTC TTC AGC ACG CAA- 
TGC- 3' 



l/i:_91 0388/91 0389 OR 9104103/9104104 

G C C A G C C 

5 '-TAG TGC GCG CGT AAC TAG TAT GGC AGC ACT TGG TAC- 

C TC TC 
TTC GAG GTT TGG -3' 

GA GA G G G C T 
5' -ACC TGC ACC CCA AAC GTC GAA GTA CCA AGT GCT GCC 

GGC 
ATA GTA GTT- 3' 
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J/j: 910390/910391 



5'- GGT GCA GGT ACC AAC GTT ACC GTT TCT TGA TAG CAG- 
GTA AGC TTA A -3' 

5'-TTA AGC TTA COT GCT ATC AAG AAA CGG TAA CGG TGG 



T -3 



76 Gene Assembly 

These pairs of oligonucleotides can be assembled into a VH gene as depicted below: 
ABCDEFGHI 

20 



25 

Pairs D/d, F/f, and l/i are degenerate and complementary oligonucleotides encompassing the "windows" 
depicted in Figure 3a, Figure 5a, and Figure 3b, respectively. The design of the other oligonucleotides was 
similar to that described by Pluckthun et al., and included the introduction of a series of restriction sites 

30 (EcoRI, Ncol, BamHj, Saul, XmaJ, Xhol. Nhel, Accl. Haelj, Spel, Clal, PstI, Nsjl. BssHII. f<p_nl, and HindJII 
useful for further manipulations (see Pluckthun, A. et al., Cqld_ Sp_ring_ Harbor Synnp^ ^o'- 
105-112 (1987)). For gene assembly (Alvarado-Urbina, G. et aj., Biochenri._CeM._Biol. 64: 548-55'5 (1986)), 
eighteen of the oligonucleotides (B-l, b-i) were phosphorylated using T4 polynucleotide kinase. Each of ten 
complementary pairs was annealed separately. The annealed pairs were then mixed and ligated together 

35 using T4 DNA ligase. The product is shown schematically below: 

EcoRI Ncol Hindlll 



40 



45 



CDRl CDR2 CDR3 



The synthetic gene was designed to contain restriction sites for cloning. Following ligation, the fully 
assembled molecules were cleaved with Ncol and HindJII, ge! purified, and inserted into vector pRB500 (see 
Example 2) at the Ncol and HindJII sites. About 150(f tranformants above the background were obtained on 
LB amp plates. The resulting constructs should contain the VH gene variants fused in frame to the pelB 
50 signal peptide. 



55 
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Example 2 

Construction of_pRB505 



5 Construction of pRBSOO 



Two complementary oligonucleotides which code for the pejB leader sequence (Lei. S.P. et_a].,J. 
Bacteriol. 169: 4379 (1987)) were chemically synthesized. The oligonucleotides, which were designed to 
have 5' and 3' overhangs complementary to Ncol and Pst I sites, were hybridized and cloned into the PstI 
70 and Ncol sites of vector pKK233.2 (Pharmacia). The oligonucleotides are shown below: 



75 



20 



25 



5'- C ATG AAA TAG CTA TTG CCT ACG GCA GCC GOT GCA- 
3'- TTT ATG GAT AAC GGA TGC CGT CGG CGA CGT 

TTG TTA TTA GCT GCC CAA CCA GCC ATG GCG AAT TCC- 
AAC AAT AAT CGA CGG GTT GGT CGG_TAC_CGC TTA AGG 

CTG CA-3' 
G -5' 



The resulting plasmid, pRBSOO has an inducible tac promoter upstream of the ATG start codon of the 
pelB sequence. There is a unique Ncol site (underlined) at the 3' end of the sequence coding for the pelB 
leader into which a gene encoding a product to oe secreted, such as the HIV protease or the Vh or Vl 
regions of an antibody, may be inserted. (The Ncol site ligated to the 5' overhang of the fragment is not 
30 regenerated.) 



Construction of_pRB503 



The HIV protease gene was obtained from pUCIS.HIV (Beckman, catalog # 267438). The gene can be 
35 excised from this plasmid as a H'QdjU-EcoRI or HindJII-BamH] fragment. However, the HindJII site in the HIV 

protease cannot be directly cloned in frame to the pe|B leader sequence present in plasmid pRBSOO. 

Therefore, a double-stranded oligonucleotide linker was designed so that the amino terminal methionine of 

the HIV protease coding sequence could be joined in frame to the coding sequence of the pelB leader 

peptide in pRB505. The following sequence was synthesized: 
40 Met Ala Pro Gin He Thr ... 



5'- AG CTT GCC ATG GCG CCG CAA ATC ACT CT- 3' 
45 3' -A CGG_TAC_CGC GGC GTT TAG TG -5' 

Ncol 

This linker has a 5*- HindjII overhang and 3' Drain overhang. The oligonucleotide was cloned into the unique 
50 HindJII and Dralll sites in pUCIS.HIV. The resulting plasmid is called pRB503. The linker introduces an Ncol 
site into the vector at the initiator methionine of the HIV protease and reconstructs the sequence as found in 
PUC18.HIV. 



Construction of _pRB505 

The HIV proteas gene was isolated from pRB503 as an Ncoj-EcoRI fragment and was cloned into the 
unique Ncol and EcoRI sites of pRBSOO. In the final construct, the HIV proteas is fused in fram to the 
pB\S leader sequence, and expr ssion is driven by the inducibi tac promoter. It is expected that the leader 
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5 



10 



15 



20 



peptidase will cleave the fusion protein between Ala and Pro (residues 2 and 3 above) of the HIV sequence, 
thereby generating an N-terminal proline just as in the wild type HIV protease. 

Example 3 

Walk-Through Mutagenesis of _the_HjV Protease Active Sjte 

A degenerate oligonucleotide which spans the Asp-Thr-Gly active site residues of the HIV protease was 
designed and synthesized. This oligonucleotide has a sequence complementary to that shown in Figure 7. 

G TC G 

TT CG GA 

5'- CAT TTC CTC GAG AAC GGT GTC ATC AGC ACC ACT GTC- 

CAG GAG AGC TTC CTT TAG TTG ACC ACC GAT TTT GAT GGT- 
AAC GAG TGG • 3' 



A second oligonucleotide, partially complementary to the above sequence was synthesized to permit 
conversion of the above degenerate oligonucleotides to double-stranded form. The complementary 
25 oligonucleotide had the following sequence: 

5'- GCA AAT CAC TCT GTG GCA GCG TCC ACT GGT TAG CAT- 
CAA AAT - 3 ' 

The degenerate oligonucleotides and complementary oligonucleotides were annealed. 

35 

G TC G 
TT CG GA 

5'. CAT TTC CTC_GAG AAC GGT GTC ATC AGC ACC ACT GTC- 
^ Xhol 

GAG CAG AGC TTC CTT TAG TTG ACC ACC GAT TTT GAT- 
45 3 ' -TA AAA CTA- 

GGT AAC CAG TGG - 3' 

CCA TTG^GTC ACC TGC GAC GGT GTC TCA CTA AAC G- 5' 

50 

The oligos were extended using the Klenow fragment of DNA polymerase. (Oliphant, A.R. and Struhl, 
K., Methods EnzymqI., 155: 568-582 (1987)). The resulting mixture was cleaved with BstEII and Xhol, and 
separated on an 8% polyacrylamlde gel. A 106 bp band containing the active site windows was isolated by 
55 electroelution from a gei slice, extracted with phenolrchloroform. and ethanol precipitated. 

Vector pRB505 was cleaved with BstEII and Xhol and then treated with calf intestinal alkaline 
phosphatase to prevent religation. The vector band was purified from a low-melting agarose gel. The 
purified BsJEII-Xhol active site windows (100 nanograms) were cloned into the BstEII and Xhol sites of 
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pRB505 (500 nanograms). The ligation mix was used to transform a TGl/pACYC177 lad*' strain and 
amplicillin resistant transformants were selected on LB amp plates (LB plus 50 ug/ml ampicillin; Miller. J.H., 
(1972), In: Experiments in_Mo[ cujar Genetjcs. Cold Spring Harbor Laboratory (Cold Spring Harbor, NY), p. 
433. Approximately 1000 transformants were obtained by this procedure. Several of these transformants 
5 were tested for novel activity using the protease plate assay described below in Example 4. 

Example 4 

Protease Activity _Plat_e Assays 

10 

Sensitiyity_o_f the Plate Assay 

In the case where the activity to be assayed Is a proteolytic activity, substrate-containing nutrient plates 
can be used for screening for colonies which secrete a protease. Protease substrates such as denatured 

75 hemoglobin can be incorporated into nutrient plates (Schumacher, G.F.B. and Schill, W.B., AnaL Biocjienn., 
48: 9-26 (1972); Benyon and Bond, Proteolytic Enzymes. 1989 (IRL Press, Oxford) p. 50). When bacterial 
colonies capable of secreting a protease are grown on these plates, the colonies are surrounded by a clear 
zone, indicative of digestion of the protein substrate present in the medium. 

A protease must meet several criteria to be detected by this assay. First, the protease must be 

20 secreted into the medium where it can interact with the substrate. Second, the protease must cleave several 
peptide bonds in the substrate so that the resulting products are soluble, and a zone of clearing results. 
Third, the cells must secrete enough protease activity to be detectable above the threshold of the assay. As 
the specific activity of the protease decreases, the threshold amount required for detection in the assay will 
increase. 

25 One or more protease substrates may be used. For example, hemoglobin (0.05 - 0.1%). casein (0.2%), 
or dry milk powder (3%) can be incorporated into appropriate nutrient plates. Colonies can be transferred 
from a master plate using and inoculating manifold, by replica-plating or other suitable method, onto one or 
more assay plates containing a protease substrate. Following growth at 37 'C (or the appropriate 
temperature), zones of clearing are observed around the colonies secreting a protease capable of digesting 

30 the substrate. 

Four proteases of different specificities and reaction mechanisms were tested to determine the range of 
activities detectable in the plate assay. The enzymes included elastase. subtilisin, trypsin, and 
chymotrypsin. Specific activities (elastase, 81U/mg powder; subtilisin. 7.8 U/mg powder; trypsin. 8600 U/mg 
powder; chymotrypsin, 53 U/mg powder) were determined by the manufacturer. A dilution of each enzyme, 

35 elastase, subtilisin, trypsin, and chymotrypsin, was prepared and 5 ul aliquots were pipetted into separate 
wells on each of three different assay plates. 

Plates containing casein, dry milk powder, or hemoglobin in a 1% Difco bacto agar matrix (10 ml per 
plate) in 50 mM Tris. pH 7.5, 10 mM CaCb buffer were prepared. On casein plates (0.2%), at the lowest 
quantity tested (0.75 ng of protein), all four enzymes gave detectable clearing zones under the conditions 

40 used. On plates containing powdered milk (3%), elastase and trypsin were detectable down to 3 ng of 
protein, chymotrypsin was detectable to 1.5 ng, and subtilisin was detectable at a level of 25 ng of protein 
spotted. On hemoglobin plates, at concentrations of hemoglobin ranging from 0.05 and 0.1 percent, 1.5 ng 
of elastase, trypsin and chymotrypsin gave detectable clearing zones. On hemoglobin plates, under the 
conditions used, subtilisin did not yield a visible clearing zone below 6 ng of protein. 

45 

Assay of Variant of Hjv Protease 

Of the approximately 1000 ampicillin resistant transformants obtained by the procedure described in 

Example 3, 300 colonies were screened using the protease plate screening assay. The ampicillin resistant 
50 colonies were screened for proteolytic activity by replica plating onto nutrient agar plates (LB plus 

ampicillin) with a top layer containing IPTG (isopropylthiogalactopyranoside) for induction of expression, and 

either dry milk powder (3%) or hemoglobin as a protease substrate. 

Protease substrate stock solutions wer made by suspending 60 mg of hemoglobin or 1.8 g of 

powdered milk in 10 ml of deionized water and incubating at 60 'C for 20 minutes. Th top layer was made 
55 by adding ampicillin and IPTG to 50 ml of melted LB agar (15 g/1) at 60 'C to final concentrations of 50 

ug/ml and 2 mM. respectively, and 10 ml of protease substrate stock solution. 10 ml of the top layer was 

layered onto LB amp plates. 
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Colonies secreting sufficient prot olytic activity which degrades Lhe particular substrate in the plate 
(e.g., dry milk) will have a zone of clearing around them which is distinguishable from the opaque 
background of the plate. Whereas none of the transformants gave a zone of clearing on hemoglobin plates, 
a large proportion of the transformants gave a zone of clearance on dry milk powder plates. Note that the 

5 dry milk powder plates had been incubated at 37 'C for about 1.5 days and then refrigerated. Although no 
halos appeared after the 1 .5 day incubation at 37 • C, more than 90% of the colonies on the assay plates 
had halos after 3 days in the refrigerator. Three sample colonies which produced halos on the assay plate 
were streaked onto dry milk powder plates containing 2 mM IPTG. Two of the three streaks grew. Distinct 
zones of clearing were again observed for these two isolates under the same conditions (grown overnight at 

10 37 'C. followed by refrigeration for three days). As a control, transformants of TGl/pACYC177 lacl*' 
containing either pRBSOO, which encodes the pelB signal sequence, but no HIV protease, or containing 
pRB505, which encodes the pelB signal sequence fused to the "wild type" HIV protease, were also 
streaked onto dry milk powder plates with 2 mM IPTG. In contrast to the transformants obtained from the 
mutagenesis, these control transformants did not give a zone of clearance on dry milk powder plates. This 

75 observation is consistent with previous results indicating that retroviral proteases are selective for viral target 
proteins (Skalka, A.M., Cell 56: 911-913 (1984)). Using this assay novel protease activities generated by the 
walk-through mutagenesis procedure can be differentiated from the wild type HIV protease by altered 
substrate specificities. 

20 Equivalents 

Those skilled in the art will recognize, or be able to ascertain using no more than routine experimenta- 
tion, many equivalents to the specific embodiments of the invention described herein. Such equivalents are 
intended to be encompassed by the following claims. 

25 

Claims 

1, A method of generating a protein library comprising a heterogeneous mixture of mutant proteins, said 
method comprising the step of specifically substituting each amino acid in one or more predefined 

30 regions of a protein with any one of one or more predetermined amino acids without saturation, the 
substitution of each predetermined amino acid being effected at essentially each and every sequence 
position of each predefined region of the protein in turn. 

2, A method of generating a protein library comprising a heterogeneous mixture of mutant proteins, said 
35 method comprising the step of specifically substituting each amino acid in one or more predefined 

regions of a protein with any one of two predetermined amino acids, the substitution of each 
predetermined amino acid being effected at essentially each and every sequence position of each 
predefined region of the protein in turn. 

40 3. A method of generating a protein library comprising a heterogeneous mixture of mutant proteins, said 
method comprising the step of specifically substituting each amino acid in a predefined region of the 
protein with a single predetermined amino acid, the substitution being effected at essentially each and 
every sequence position of the predefined region of the protein in turn. 

45 4. The method of any one of Claims 1-3, wherein (a) a predetermined amino acid is introduced into two or 
more predefined regions of the protein; or (b) the predetermined amino acid is Ser, Thr, Asn, Gin, Tyr, 
Cys, His, Glu, Asp, Lys or Arg; or (c) the proportion of mutant proteins containing at least one residue 
of the predetermined amino acid in the predefined region ranges from about 12.5% to 100% of all 
mutant proteins in the library, and for example wherein the library comprises mutant proteins containing 

50 the predetermined amino acid in from one to all positions in the predefined region. 

5. The method of any one of Claims 1 -3, wherein substitution of the predetermined amino acid is effected 
at each and every sequence position of the predefined region or at least one predefined region of the 
protein. 

55 

6. The method of any one of Claims 1-3, wherein the predefined region or at least one predefined region 
comprises a functional domain of the protein. 
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7. The method of Claim 6, wherein (a) th predefined region comprises a domain at or around the 
catalytic sit of an nzyme or a binding domain; or (b) the predefined region comprises a hypervariable 
region of an antibody. 

5 8. The method of any one of Claims 1-3, further comprising screening the library of mutant proteins to 
select mutant proteins having a desired structure or function. 

9. The method of any one of Claims 1-3, wherein one or more of the predetermined amino acids is 
selected from the group consisting of: Asp, His, Tyr and Ser. 

10 

10. A library of mutant proteins prepared according to the method of any one of Claims 1-9. 

11. A method of mutagenesis of a gene encoding a protein, comprising: 

a) selecting a defined region of the amino acid sequence of the protein encoded by the gene to be 
75 mutagenized; 

, b) determining a single amino acid residue to be inserted into amino acid positions in the defined 
region; 

c) synthesizing a mixture of oligonucleotides, comprising a nucleotide sequence for the defined 
region, wherein each oligonucleotide contains, at each sequence position in the defined region, 

20 either a nucleotide required for synthesis of the protein to be mutagenized or a nucleotide required 

for a codon of the predetermined amino acid, the mixture containing all possible variant 
oligonucleotides according to this criterion; and 

d) generating an expression library of cloned genes containing said oligonucleotides. 

25 12. A method of Claim 11, wherein (a) the defined region comprises a functional domain of the protein; or 
(b) the defined region comprises a domain at or around the catalytic site of an antibody; or (c) the 
defined region comprises a hypervariable region of an antibody; or (d) the predetermined amino acid is 
Ser, Thr, Asn, Gin, Tyr, Cys, His, Glu, Asp, Lys or Arg. 

30 13. The method of Claim 11, further comprising one or more of the following steps: 

a) expressing said library of cloned genes to produce mutant proteins; and/or 

b) screening said library of cloned genes or encoded proteins to select for a desired structure or 
function. 

35 14. A library of cloned genes prepared according to the method of Claim 11. 

15. A method of producing a mutant protein having a desired structure or function by walk-through 
mutagenesis, comprising the steps of: 

a) selecting one or more defined region(s) of the amino acid sequence of the protein to be 
40 mutagenized without saturation; 

b) determining one or more amino acid residue(s) to be inserted into amino acid positions in said 
defined region(s); 

c) synthesizing a mixture of oligonucleotides, comprising a nucleotide sequence for said defined 
region(s), wherein each oligonucleotide contains, at each sequence position in said defined region- 

45 (s), either a nucleotide required for synthesis of the protein to be mutagenized or a nucleotide 

required for a codon of said predetermined amino acid(s), the mixture containing all possible variant 
oligonucleotides according to this criterion; 

d) generating an expression library of clones containing said oligonucleotides; 

e) screening the library to detect a clone encoding a mutant protein having the desired structure or 
50 function; and 

f) expressing a mutant protein having the desired structure or function by virtue of the presence of 
the oligonucleotide present in the clone detected in step (e). 

16. Th method of Claim 15, wherein (a) for at least one defined region, two predetermined amino acids 
55 ar s lect d to b inserted into amino acid positions in said defined region(s); or (b) for at least one 

defined region, a single predetermined amino acid is selected to be inserted into amino acid positions 
in said defined region. 
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17. A library of mutants of a protein, comprising mutant proteins in wfiich, collectively, a predetermined 
amino acid appears at least once in essentially every position in a region of the protein, wherein 
mutants containing at least one residue of the predetermined amino acid in a region of the protein 
comprise a proportion ranging from about 12.5% to 100% of the total number of different mutants in 

5 the library. 

18. A library of Claim 17, wherein (a) the mutant proteins contain the predetermined amino acid in from one 
to all positions at once in the region, according to a statistical distribution; or (b) the protein is an 
enzyme and the region is at or around the catalytic site; or (c) the protein is an antibody or portion 

10 thereof and the region is a hypervariable region of the antigen-binding site; or (d) the predetermined 
amino acid is selected from the group consisting of: Ser, Thr, Asn, Gin, Tyr, Cys, His, Glu, Asp, Lys 
and Arg. 

19. A library of HIV protease mutants, comprising mutant proteins in which, collectively, three predeter- 
15 mined amino acids appear at least once in all positions of the active site region of the protease. 

20. The library of Claim 19, wherein the three predetermined amino acids are Asp, His and Ser. 

21. A mutant protein of the library of Claim 20 wherein Asp, His and Ser appear in the active site region. 

20 

22. A method of producing a mixture of oligonucleotides for mutagenesis without saturation of a nucleotide 
sequence encoding a selected region of a protein to introduce a predetermined amino acid at each 
position in the region, comprising synthesizing a mixture of oligonucleotides comprising a nucleotide 
sequence for the preselected region, wherein each oligonucleotide contains, at each sequence position 

25 in the selected region, either a nucleotide required for synthesis of the amino acid of the region or a 
nucleotide required for a codon of the predetermined amino acid, the resulting mixture containing alt 
possible variant oligonucleotides containing either of the two nucleotides at each position. 

23. A mixture of oligonucleotides produced by the method of Claim 22, wherein about 12.5% to 100% of 
30 the oligonucleotides contain at least one codon for a single, predetermined amino acid. 

24. An instrument for DNA synthesis having ten reagent vessels, each of four vessels containing a different 
one of the four nucleotide synthons corresponding to the four nucleotides of DNA and each of six 
containing vessels containing one of the six different mixtures of two synthons. 

35 

Patentanspruche 

1. Ein Verfahren zur Bildung einer Protein-Bibliothek, die eine heterogene Mischung mutierter Proteine 
umfaflt, wobei besagtes Verfahren den Schritt der spezifischen Substitution jeder Aminosaure in einer 
40 Oder mehreren zuvor bestimmten Regionen eines Proteins mit einer Oder mehreren beliebigen, zuvor 
bestimmten Aminosauren ohne Sattigung aufweist, wobei die Substitution jeder zuvor bestimmten 
Aminosaure im wesentlichen in alien und jeder Sequenzposition von jeder zuvor bestimmten Region 
des Proteins der Reihe nach bewerksteliigt wird. 

45 2. Ein Verfahren zur Bildung einer Protein-Bibliothek, die eine heterogene Mischung mutierter Proteine 
umfaGt, wobei besagtes Verfahren den Schritt der spezifischen Substitution jeder Aminosaure in einer 
Oder mehreren zuvor bestimmten Regionen eines Proteins mit einer beliebigen von zwei zuvor 
bestimmten Aminosauren aufweist, wobei die Substitution jeder zuvor bestimmten AminosSure im 
wesentlichen in alien und jeder Sequenzposition von jeder zuvor bestimmten Region des Proteins der 

50 Reihe nach bewerksteliigt wird. 

3. Ein Verfahren zur Bildung einer Protein-Bibliothek, die eine heterogene Mischung mutierter Proteine 
umfa6t, wobei besagtes Verfahren den Schritt der spezifischen Substitution jeder AminosMure in einer 
zuvor bestimmten Proteinregion mit einer einzigen, zuvor bestimmten AminosSuren aufweist, wobei die 
55 Substitution im wesentlichen in alien und jeder Sequenzposition der zuvor bestimmten Proteinregion 
der Reihe nach bewerksteliigt wird. 
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4. Das Verfahren gemaB einem der Anspruche 1 bis 3, worin (a) eine zuvor bestimmte Aminosaure in 
zwei-oder mehrere zuvor bestimmte Proteinregionen eingefOgt wird; Oder (b) di zuvor bestimmte 
AminosSure Ser, Thr, Asn, Gin, Tyr, Cys, His, Glu. Asp, Lys. Oder Arg ist; Oder (c) der Anteil der 
mutierten Proteine. die mindestens einen Rest der zuvor bestimmten AmlnosSure in der zuvor 

5 bestimmten Region enthalten, im Bereich von etwa 12,5% bis 100% aller mutierten Proteine der 
Bibliothek liegt. und worin zum Beispiel die Bibliothek mutierte Proteine umfaQt, welche die zuvor 
bestimmte Aminosaure in einer bis zu alien Positionen in der zuvor bestimmten Region enthalten. 

5. Das Verfahren nach einem der AnsprOche 1 bis 3, worin die Substitution der zuvor bestimmten 
10 Aminosaure in alien und jeder Sequenzposition der zuvor bestimmten Region oder wenigstens einer 

zuvor bestimmten Region des Proteins bewerkstelligt wird. 

6. Das Verfahren nach einem der Anspruche 1 bis 3, worin die zuvor bestimmte Region oder wenigstens 
eine zuvor bestimmte Region eine funktionelle Domane des Proteins aufweist. 

15 

7. _Das Verfahren gemaB Anspruch 6, worin (a) die zuvor bestimmte Region eine Domane an oder nahe 

des katalytischen Zentrums eines Enzyms oder eine BindungsdomMne aufweist; oder (b) die zuvor 
bestimmte Region eine hypervariable Region eines Antikorpers aufweist. 

20 8. Das Verfahren nach einem der AnsprOche 1 bis 3, das des weiteren ein Screening der Bibliothek 
mutierter Proteine zur Selektion von mutierten Protelnen mit einer gewOnschten Struktur oder Funktion 
umfaflt. 

9. Das Verfahren nach einem der Anspruche 1 bis 3, worin eine oder mehrere der zuvor bestimmten 
25 Aminosauren Asp, His, Tyr, oder Ser ist. 

10. Eine Bibliothek mutierter Proteine, die nach dem Verfahren gemaB einem der Anspruche 1 bis 9 
gebildet ist. 

30 11. Ein Verfahren zur Mutagenese eines Gens, das ein Protein codlert, wobei das Verfahren umfafit: 

a) AuswShlen einer bestimmten Region der AminosSuresequenz des Proteins, das von dem zu 
mutagenisierenden Gen codiert wird; 

b) Bestimmen eines einzigen Aminosaurerests zur Insertion in Aminosaurepositionen in der be- 
stimmten Region; 

35 c) Herstellen einer Oligonucleotidmischung, die eine Nucleotidsequenz fUr die bestimmte Region 

umfaBt, worin jedes Oligonucleotid in jeder Sequenzposition in der bestimmten Region entweder ein 
Nucleotid, das fur die Synthese des zu mutagenisierenden Proteins erforderlich ist, oder ein 
Nucleotid, das fUr ein Codon der zuvor bestimmten Aminosaure erforderlich ist, enthalt, wobei die 
Mischung alle moglichen verschledenen Oligonucleotide diesem Kriterium gemSB enthSIt; und 

40 d) Bilden einer Expressionsbibliothek geklonter Gene, die besagte Oligonucleotide enthalten. 

12. Ein Verfahren gemaB Anspruch 11, worin (a) die bestimmte Region eine funktionelle Domane des 
Proteins aufweist; oder (b) die bestimmte Region eine Domane an Oder nahe des katalytischen 
Zentrums eines Antikorpers aufweist; oder (c) die bestimmte Region eine hypervariable Region eines 

45 Antikorpers aufweist; oder (d) die zuvor bestimmte AminosSure Ser. Thr, Asn, Gin, Tyr. Cys, His, Glu, 
Asp, Lys Oder Arg ist. 

13. Das Verfahren gemSB Anspruch 11, das des weiteren einen Oder mehrere der folgenden Schrltte 
umfaBt: 

50 a) Expression besagter Bibliothek geklonter Gene, urn mutierte Proteine herzustellen; und/oder 

b) Screening besagter Bibliothek geklonter Gene oder codierter Proteine, um eine gewUnschte 
Struktur oder Funktion zu selektieren. 

14. Eine Bibliothek geklonter Gene, di nach dem Verfahren gemSB Anspruch 11 herg stellt werden. 

55 

15. Ein Verfahren zur Herstellung in s mutierten Proteins mit ein r gewUnschten Struktur oder Funktion 
mittels walkthrough Mutagenese, wobei das Verfahren die Schrltte umfaBt: 
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a) Auswahlen einer oder mehrerer bestimmten Regionen der Aminosauresequenz des Proteins, das 
ohne Sattigung mutagenisiert werden soli; 

b) Bestimmen eines Oder mehrerer AminosMurereste. die in Aminosaurepositionen in besagten 
bestimmten Regionen inseriert werden sollen; 

5 c) Herstellen einer Oligonucleotldmischung, die eine Nucleotidsequenz fur besagte bestimmte 

Regionen umfaflt, worin jedes Oligonucleotid in jeder Sequenzposition in besagten bestimmten 
Regionen entweder ein Nucleotid, das zur Synthese des zu mutagenisierenden Proteins erforderlich 
ist, Oder ein Nucleotid, das fUr ein Codon besagter zuvor bestimmten Aminosauren erforderlich 1st, 
enthalt, wobei die Mischung alle moglichen verschiedenen Oligonucleotide diesem Kriterium gemafl 

10 enthalt; 

d) Bilden einer Expressionsbibliothek von Klonen. die besagte Oligonucleotide enthalten; 

e) Screening der Bibliothek zur Detektion eines Klons, der ein mutiertes Protein mit der gewunsch- 
ten Struktur oder Funktion codiert; und 

f) Expression eines mutierten Proteins, das aufgrund der Anwesenheit des Oligonucleotids, das in 
75 dem in Schritt (e) detektierten Klon vorhanden ist, die gewGnschte Struktur oder Funktion besitzt. 

16. Ein Verfahren gemaB Anspruch 15, worin (a) fur wenigstens eine bestimmte Region zwei zuvor 
bestimmte Aminosauren ausgewahit werden, die in Aminosaurepositionen in besagter oder in besagten 
bestimmten Region(en) inseriert werden sollen; oder (b) fOr wenigstens eine bestimmte Region, eine 

20 einzige zuvor bestimmte AminosSure ausgewShIt wird, die in eine Aminosaureposition in besagter 
bestimmten Region inseriert werden soil. 

17. Eine Bibliothek von Mutanten eines Proteins, die mutierte Proteine umfaBt, in denen insgesamt eine 
zuvor bestimmte Aminosaure mindestens einmal im wesentlichen in jeder Position In einer Region des 

25 Proteins vorkommt, worin Mutanten. die mindestens einen Rest der zuvor bestimmten Aminosaure in 
einer Region des Proteins enthalten, einen Anteil im Bereich von etwa 12,5% bis 100% der Gesamtzahl 
der verschiedenen Mutanten in der Bibliothek aufweisen. 

18. Eine Bibliothek gemSS Anspruch 17, worin (a) die mutierten Proteine die zuvor bestimmte Aminosaure 
30 in einer bis zu alien Positionen der Region gemaB einer statistischen Verteiiung gleichzeitig enthalten; 

Oder (b) das Protein ein Enzym ist, und die Region am oder nahe des katalytischen Zentrums ist; oder 
(c) das Protein ein Antikorper Oder ein Teil dessen ist, und die Region eine hypervariable Region der 
Antigen-Bindungsstelle ist; oder (d) die zuvor bestimmte Aminosaure Ser, Thr, Asn, Gin, Tyr, Cys, His, 
Glu, Asp. Lys Oder Arg ist. 

35 

19. Eine Bibliothek von HIV- Protease- Mutanten, die mutierte Proteine umfassen, in denen gemeinsam drei 
zuvor bestimmte AminosSuren wenigstens einmal in alien Positionen im Bereich des aktiven Zentrums 
der Protease vorkommen. 

40 20. Die Bibliothek gemaB Anspruch 19, worin die drei zuvor bestimmten Aminosauren Asp, His und Ser 
sind. 

21. Ein mutiertes Protein der Bibliothek gemaB Anspruch 20, worin Asp, His und Ser im Bereich des 
aktiven Zentrums vorkommen. 

45 

22. Ein Verfahren zur Herstellung einer Oligonucleotldmischung zur Mutagenese ohne Sattigung einer 
Nucleotidsequenz, die eine ausgewahlte Region eines Proteins zum Einfugen einer zuvor bestimmten 
Aminosaure in jeder Position in der Region codiert, und die Herstellung der Oligonucleotldmischung 
umfaBt, die eine Nucleotidsequenz fOr die zuvor ausgewShlte Region umfaBt, worin jedes Oligonucleotid 

50 an jeder Sequenzposition in der ausgewShlten Region entweder ein fOr die Synthese der AmInosSure 
der Region erforderliches Nucleotid oder ein fUr ein Codon der zuvor bestimmten Aminosaure 
erforderliches Nucleotid enthalt. und die resultierende Mischung alle moglichen verschiedenen Oligonu- 
cleotide enthalt, die jedes der beiden Nucleotide an jeder Position enthalten. 

55 23. Eine Oligonucleotidmischung. di nach dem Verfahren gemaB Anspruch 22 hergestellt wird, worin etwa 
12,5% bis 100% der Oligonucleotide wenigstens ein Codon fOr eine einzige, zuvor bestimmte 
Aminosaure enthalten. 
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24. Eine Vorrichtung zur DNA-Synthese mit zehn ReagenzgefaBen. wcbei jedes von ver Gefaeen em 
unterschiedliches der vier Nucleotidsynthone entnait. die mit den vier Nucleotiden der DNA Oberein- 
stimmen, und jedes von sechs Gefa6en eine der sechs unterscliledlichen MIschungen zweier Synthone 
enthalt. 



Revendlcatlons 



10 



15 



25 



30 



Proc6d6 pour la production d'une banque de prot§ines comprenant un melange het^rogene de 
proteines mutantes, ledit proc6d§ confiprenant r6tape consistant & substituer de fagon sp§cifique 
chaque amino-acide dans une ou plusieurs regions pr6d6finies d'une prot^ine par un ammo acide 
quelconque d'une s6rie d'un ou plusieurs amino-acides pr6d§termin6s. sans saturation, la substitution 
de chaque amino-acide pr6d§termin§ etant r§alis6e tour a tour pratiquement en chacune des positions 
de la sequence de chacune des regions pr6d6finies de la prot§ine. 

Proc6d6 de production d'une banque de protdines comprenant un melange h6t§rog6ne de proteines 
mutantes. ledit proc6d6 comprenant l'6tape consistant a substituer de fagon sp^cifique chaque amino- 
' acide dans une ou plusieurs regions pr6d6fines d'une prot^ine par un amino-acide quelconque d une 
s^rie de deux amino-acides pr6d6termin6s, la substitution de chaque amino-acide pr6d6termin6 6tant 
rSalis6e tour a tour pratique ment en chacune des positions de la sequence de chacune des regions 
20 pr4d6finies de la prot^ine. 

3 Ptoc6d6 de production d'une banque de proteines comprenant un melange h6t6rogfene de proteines 
mutantes ledit proc6d6 comprenant l'§tape consistant h substituer de fagon sp6cifique chaque ammo 
acide d'une region pr4d6finie de la prot^ine par un amino-acide predetermine unique, la substitution 
etant realisee tour k tour pratiquement en chacune des positions de la sequence de la region predefinie 
de la proteine. 

Procede selon I'une quelconque des revendications 1 ^ 3. dans lequel (a) un amino-acide predetermine 
est introduit dans deux ou plusieurs regions predeiinies de la proteine; ou (b) I'amino acide prede 
termine est Ser. Thr. Asn. Gin. Tyr. Cys. His. Glu. Asp. Lys ou Arg; ou (c) la proportion de prat6ines 
mutantes contenant au moins un residu de I'amino-acide predetermine dans la region predetmie 
correspond k environ 12.5 % jusqu'^ 100% de toutes les proteines mutantes de la banque, la banque 
comprenant par exemple des proteines mutantes contenant I'amino-acide predetermine au niveau 
d'une ou plusieurs voire toutes les positions de la region pr6definie. 

Precede selon I'une quelconque des revendications 1 ^ 3. dans lequel la substitution de I'amino-acide 
predetermine est realisee & toutes les positions de la sequence, sans exception, de la region predefinie 
ou au moins dans une region predefinie de la proteine. 

40 6. Procede selon I'une quelconque des revendications 1 ^ 3. dans lequel la region predefinie ou au moins 
une region predefinie comprend un domaine fonctionnel de la proteine. 

7. Precede selon la revendication 6. dans lequel (a) la region predefinie comprend un domaine a" niveau 
ou aux environs du site catalytique d'une enzyme ou un domaine de liaison; ou (b) la region predefinie 

45 comprend une region hypervariable d'une anticorps. 

8. Procede selon I'une quelconque des revendications 1 k 3, comprenant en outre I'etape de criblage de 
la banque de proteines mutantes afin de seiectionner les proteines mutantes presentant la structure ou 
la fonction souhaitee. 

9. Precede selon I'une quelconque des revendications 1 k 3. dans lequel un ou plusieurs des amino 
acides predetermines est choisi dans le groupe consitute de : Asp, His, Tyr, et Ser. 

10. Banque d proteines mutantes prepare selon la methode de I'une quelconque des revendications 1 k 

55 9. 

11. Procede de mutagenfese d'une gfene codant pour une proteines comprenant : 
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(a) la selection d'une region definie de la sequence d'amino-acide de la protdine codee par le g§ne 
-objet de la mutag^n^se; 

(b) la determination du seul r^sidu amino-acide devant etre au niveau des positions des 
amino-acides dans la region definie; 

(c) la synthase d'un melange d'oligonucl^otides comprenant un sequence nucldotidique pour la 
region definie, dans lequel chaque oligonucleotide contient. au niveau de chaque position de la 
sequence de la region definie soit un nucleotide n^cessaire ^ la synthase de la prot^ine objet de la 
mutag^n^se ou un nucleotide n^cessaire pour un codon de I'amino-acide predetermine, le melange 
contenant toutes les variantes d'oligonucieotides possibles repondant a ce critere; et 

(d) la production d'une banque d'expression de gfenes clones contenant lesdits oligonucleotides. 

12. Precede selon la revendication 1 1 , dans lequel (a) la region definie comprend un domaine fonc tionnel 
de la proteine; ou (b) la region definie comprend un domaine au niveau ou aux environs du site 
catalytique d'une antlcorps; ou (c) la region definie comprend une region hypervariable d'un anticorps; 
ou (d) {'amino acide predetermine est Ser, Thr, Asn, Gin. Tyr, Cys, His, Glu, Asp, Lys ou Arg. 

13. Precede selon la revendication 11 comprenant en outre une ou plusieurs des etapes suivantes: 

(a) exprimer ladite banque de gfenes clones de fagon ^ produire les proteines mutantes; et/ou 

(b) cribler ladite banque de gfenes clones ou de proteines codes de fagon k seiectionner la structure 
ou fonction souhaitee. 

14. Banque de g^nes clones prepares selon le precede de la revendication 11. 

15. Precede pour la production d'une proteine mutante presentant la structure ou la fonction souhaitee par 
mutagenfese par cheminement, comprenant les etapes conslstant k : 

(a) seiectionner une ou plusieurs des regions definies de la sequence d'amino-acide de la proteine 
ebjet de la mutagen^se, sans saturation; 

(b) determiner un eu plusieurs des residus amino-acides devant etre inseres au niveau des positions 
d' amino acides dans ladite ou lesdites regions definies; 

(c) synthetiser un melange d'oligonucieotides comprenant une sequence nucieotidique pour ladite 
ou lesdites regions definies, dans lequel chaque oligonucleotide contient k chaque position de 
sequence dans chaque region definie, soit un nucleotide necessaire pour la synthese de la proteine 
objet de la mutagenese ou un nucleotide necessaire pour un coden dudit ou desdits amino-acides 
predetermines, le melange contenant toutes les variantes d'oligonucieotides possibles repondant k 
ce critfere; et 

(d) generer une banque d'expression de clones contenant lesdits oligonucleotides; 

(e) cribler ia banque de fagon k detector un clone cedant pour une proteine mutante presentant la 
structure ou la fonction souhaitee; et 

(f) exprimer une proteine mutante presentant la fonction ou structure souhaitee grace k religonucieo- 
tide present dans le clone detecte k retape (e). 

16. Precede seton la revendication 15, dans lequel (a) pour au moins une region definie, deux amino 
acides predetermines sent seiectionnes et inseres au niveau des positions d'amino-acides dans ladite 
eu lesdits regions definies; ou (b) pour au moins une des regions definies, un unique amino-acide 
predetermine est selectionne de fagon k etre insere au niveau des positions d'amino-acide dans ladite 
region definie. 

17. Banque de mutants d'une proteine, comprenant des proteines mutantes dans lesquelles, dans leur 
ensemble, un amino acide predetermine apparalt au moins une fois au niveau de presque toutes les 
positions d'une region de la proteine. la proportion de proteines mutantes contenant au moins un residu 
de I'amine-acide predetermine dans une region de la proteine correspondant k environ 12,5% jusqu'Si 
100% de la totatite des differentes proteines mutantes de la banque.. 

18. Banque selon la revendication 17, dans laquelle (a) les proteines mutantes contiennent I'amlno-acide 
predetermine au niveau d'au moins une jusqu'^ toutes les positions de la region, selon une distribution 
statistiqu . ou (b) la proteine est une enzyme et la region est celle du site catalytiqu ou une region 
voislne de celle-ci; ou (c) la proteine est un anti-corps ou une portion d'anticorps t la region est une 
region hypervariable du site de liaison de rantigdne; ou (d) I'amino-acide predetermine est choisi dans 
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le groupe constitue de Ser, Thr. Asn, Gin, Tyr, Cys, His, Glu. Asp, Lys et Arg. 

19. Banque de proteases mutantes de I'HIV comprenant des proteines mutantes dans lesquelles, dans leur 
ensemble, trois amino-acides pred^ternnin^s appraissent au moins une fois dans toutes les positions de 

5 la region du site actif de la protease. 

20. Banque selon la revendication 19, dans laquelle les trois amino-acides predetermines sont Asp, His et 
Ser. 

10 21. Prot^ine mutante de la banque selon la revendication 20, dans laquelle Asp. His et Ser appraissent 
dans la region du site actif. 

22. Proc6d6 pour la production d'un melange d'oligonucl^otides permettant la mutag^nfese sans saturation 
d'une sequence nucl^otidique codant pour une region s^lectionn^e d'une prot^ine de fagon h introduire 

75 un amino acide predetermine ^ chaque position de la region, comprenant la synthase d'un melange 
..d'oiigonucleotides comprenant une sequence nucleotidique pour la region prealablement selectionn^e, 
dans lequel chaque oligonucleotide contient h chaque position de la sequence dans la region 
seiectionnee, soit un nucleotide necessaire h la synthase de Tamino-acide de la region ou un 
nucleotide necessaire pour un codon de Tamino-acide predetermine, le melange resultant comprenant 

20 toutes les variantes possibles d'oligonucieotides contenant I'un ou I'autre des deux nucleotides a 
chaque position. 

23. Melange d'oligonucieotides produits par le precede de la revendication 22. dans lequel environ 12.5 a 
100% des oligonucleotides contiennent au moins un codon pour un unique aminoacide predetermine. 

25 

24. Appareil pour la synthase d'ADN comportant dix cuves de reactifs, une cuve sur quatre contenant un 
synthon nucleotidique different des quatre synthons nucleotidiques correspondant aux quatre nucleoti- 
des de TADN et une cuve sur six contenant I'un des six melanges difterents de paires de synthons, 

30 
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FIG. 2 
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